Childhood obesity prevention trials: A systematic review and meta‐analysis on trial design and the impact of type 1 error

Effect sizes from previously reported trials are often used to determine the meaningful change in weight in childhood obesity prevention interventions because information on clinically meaningful differences is lacking. Estimates from previous trials may be influenced by statistical significance; therefore, it is important that they have a low risk of type 1 error. A systematic review and meta‐analysis were conducted to report on the design of child obesity prevention randomized controlled trials and effectiveness according to risk of type 1 error. Eighty‐four randomized controlled trials were identified. A large range of assumptions were applied in the sample size calculations. The most common primary outcome was BMI, with detectable effect size differences used in sample size calculations ranging from 0.25 kg/m2 (followed up at 2 years) to 1.1 kg/m2 (at 9 months) and BMI z‐score ranging from 0.1 (at 4 years) to 0.67 (at 3 years). There was no consistent relationship between low risk of type 1 error and reports of higher or lower effectiveness. Further clarity of the size of a meaningful difference in weight in childhood obesity prevention trials is required to support evaluation design and decision‐making for intervention and policy. Type 1 error risk does not appear to impact effect sizes in a consistent direction.


| BACKGROUND
Worldwide, 340 million children aged 5-18 years and 38 million children aged up to 5 years are living with overweight or obesity. 1 Rates of childhood obesity have further increased because of lockdown measures during the Covid-19 pandemic. 2Obesity in children has been linked to conditions such as diabetes and poor mental health during childhood. 3,4Individuals living with overweight or obesity as a child are more likely to have overweight or obesity in adulthood 5,6 and as a result suffer from obesity-related chronic diseases and, as recently shown, 7 death from infectious disease such as Covid-19.This highlights the ongoing importance of tackling childhood obesity including as part of the pandemic recovery. 8cognition of the impact of childhood obesity on the public's health has led to intensive efforts to develop effective prevention programs that can be applied broadly.Evidence from systematic reviews of trials aimed at testing the effectiveness of obesity prevention interventions in children [9][10][11][12] often shows mixed or lack of an effect as evaluated by differences in the prevalence of overweight and obesity or continuous measure of fatness between intervention and control arms.
Statistics widely used to evaluate differences in prevalence of obesity are p-values (using alpha < 0.05 as a decision rule) and 95% confidence intervals that display the interval around the estimate within which the probability of rejecting the null hypothesis when the null hypothesis is true is 5% or less (again assuming alpha < 0.05). 13us, alpha, which shows the probability of committing a type I error, has often been deemed as important in the assessment of the success of an obesity prevention intervention.Also of high importance are other statistics that are related to alpha (or type 1 error) such as the minimal detectable effect, power, sample size, variance of the outcome variable, and other properties that are dependent on the study design. 14Among these statistics, the minimal detectible effect size is difficult to establish in obesity prevention trials in children because of the lack of consensus on what level of weight change over time constitutes obesity prevention.
In adults, a rule of thumb of a 5% change in body weight has been used for many years 15 to indicate a clinically important effect in obesity treatment, and more recently, a change of less than 3% has been used to define weight maintenance. 16However, growth as well as multiple other differences make these simple guidelines inappropriate for use in children.Currently, there is little guidance on the amount of change in weight-related measures that constitutes a clinically detrimental change versus a healthy or inconsequential change in children.
A population-level reduction in BMI z-score of À0.13 within children aged 2 to 5 years has been suggested to achieve long-term health benefits and healthcare cost savings within obesity prevention trials.This was determined based on obesity-related health impact modeling. 9However, determining what a clinically meaningful effect size in childhood obesity prevention trials is challenging.Data from studies that have examined clinical effectiveness is inconsistent with many studies drawing on data of populations with children living with obesity or lacking longer-term follow-up data that are needed to understand if changes in BMI are sustained. 17 support with trial design, effect sizes seen in previously reported studies are often used as estimates of the minimal detectable effect expected in sample size calculations for new studies. 14However, the use of previously reported findings based on a reported statistically significant difference does not indicate that the difference is sufficiently large to be clinically meaningful.On the other hand, the use of an unrealistically large minimal detectable effect size in power calculations may lead to a study that has inadequate sample size and power to find smaller effects that may be clinically important.
The aim of this review was to explore the design of childhood obesity prevention randomized controlled trials and their effectiveness according to their risk of type 1 error.We describe the methodologies of trials and the assumptions used within sample size calculations to identify how outcome measures are being decided in the absence of clear guidance of what a clinically important difference is in prevention trials.In addition, we compare the effectiveness of studies deemed high risk of type 1 error to those low risk of type 1 error to explore if there is a difference in the overall effectiveness and if those deemed low risk of type 1 error have a higher or lower overall effectiveness.Exploring if outcomes differ according to risk of type 1 error can determine whether the risk of type 1 error of a previous study used to support trial design should be considered to ensure the included outcome measure is appropriate to determine if an intervention is effective.The findings of this review can provide guidance to those designing future childhood obesity prevention randomized trials.

| METHODS
This systematic review is reported according to PRISMA reporting guidelines 18 and was registered on PROSPERO before the final searches were conducted.The PROSPERO registration can be accessed here https://www.crd.york.ac.uk/prospero/display_record.

php?RecordID=131536
The Cochrane Collaboration Handbook 19 was used to provide guidance on the meta-analysis methods, and the eligibility criteria follow similar criteria to the Cochrane Review on "Interventions to prevent obesity in children" published in 2019. 9However, as the current review has a focus on trial design and the risk of type 1 error within studies, a more sensitive search was conducted and the eligibility criteria have been developed to reflect the purpose of this review.The search terms were chosen to identify randomized controlled trials of childhood obesity prevention interventions.Search terms were categorized into five groups: study design (i.e., randomized controlled trials), population (i.e., infant, children, and adolescents), intervention (i.e., obesity prevention), setting (i.e., school and community), and outcome (i.e., BMI) (see Data S1).

| Design
Eligible studies were randomized controlled trials in which an obesity prevention intervention was tested against a comparator.Studies that were described as pilot or feasibility studies were not eligible for inclusion.To account for studies that do not clearly state they are a pilot or feasibility study, a criterion requiring studies to have a minimum of 100 participants recruited in total was applied.A minimum of 100 participants was decided because of an assumption that studies with a sample size under 100 participants are more likely to be a pilot or feasibility study.However, studies that recruited less than 200 participants were removed in exploratory subgroup analyses to allow the exploration of studies with larger samples.Individual and cluster randomized studies were eligible for inclusion, and no criteria relating to the number of clusters in studies were applied.Follow-up data must have been provided for participants at or later than 6 months from the beginning of the intervention and interventions for women during pregnancy and infancy had to provide follow-up data from children at least 12 months of age.Longer follow-up periods have been specified as they are suited to obesity prevention studies to determine the long-term implications of the intervention, rather than exploring the immediate effect of the intervention that is suited to determining obesity treatment. 20Studies retrieved from any date and in any language were included.

| Population
The review focused on population-based (non-clinical) studies.In order for the study to be eligible, children had to be under the age of 18 years at the commencement of the study.Adults could be included in the study; however, the primary outcome had to relate to the child.
Studies that recruited only adults with no child outcomes or did not have child outcomes that were separate from adult outcomes were not eligible for inclusion.Additionally, clinical studies that recruited specialist populations with a condition that could have an impact on a child's weight status (e.g., children with Prader-Willi syndrome, Cushing Syndrome, Hypothyroidism, and Hashimoto's Disease) were ineligible.Studies in which children were specifically recruited based on their weight status or via clinical/medical referral were also not eligible for inclusion as the review aimed to explore study design and outcomes of interventions designed to target the general population.

| Intervention
An eligible intervention must have been designed to bring about behavior changes (e.g., to physical activity levels or energy intake) that contribute toward obesity prevention in children.Interventions must have involved children and/or their parent/care giver.Interventions could take place in the home and out-of-home settings.Treatment interventions that were designed specifically for individuals already living with overweight or obesity were not eligible.

| Outcome measures
A measure of obesity prevention must have been reported as the primary outcome.The primary outcome was assumed based on if the outcome was referred to as a primary or the main outcome measure within the paper, was the outcome measure included within the sample size calculation, or was confirmed as the primary outcome in a referenced protocol or trial registry.These included weight and height, BMI, BMI z-score, BMI percentile, percent body fat, ponderal index, skin fold thickness and prevalence, or incidence of overweight and obesity.Studies with primary outcomes that were self-reported were not eligible for inclusion.

| Output
Evidence sources were restricted to peer-reviewed journal articles.
Conference abstracts, letters to editor, commentaries, and theses were not eligible for inclusion as they would not provide the required information to be included in the review.No publication date criteria were applied to allow the exploration of how child obesity prevention trials have previously been designed.However, studies that were published before the year 2000 have been removed from some analyses to explore the impact of bias of including studies that were conducted at a time when trial protocols and pre-registrations were less common practice.

| Screening and data extraction
The literature search was conducted by one reviewer (LP) who collated all the articles and removed duplicates.All titles and abstracts were screened by the same reviewer (LP) with members of the review team (HS, WB, LM, ES) second reviewing at least 100 articles each.
Disagreements were resolved through a discussion with a third reviewer (MB).In the full-text review, a sample of 150 articles was second reviewed by three reviewers (HS, ES, LM).Disagreements were resolved through discussion with a third reviewer (MB).Kappa scores were generated between each set of reviewers to ensure there was adequate agreement with the screening process prior to the first reviewer conducting the remainder of the screening process.An adequate score was defined as achieving a 0.8 kappa score that equates to a strong inter-rater agreement. 21Reasons for exclusion of articles reviewed at the full-text stage were recorded based on the first exclusion criteria identified.
All studies eligible for inclusion had data extracted by one reviewer (LP) with 50% of papers being extracted by a second reviewer (ES).Discrepancies were discussed through discussion with a third reviewer (MB) to reach an agreement.Descriptive data (study and intervention design, sample size calculation, and sample characteristics) were extracted into a purpose-designed Microsoft Access database.Outcome data were extracted directly onto a Microsoft Excel spreadsheet.
Authors were contacted to gain access to missing outcome data.
In order to describe how childhood obesity prevention trials have been designed, characteristics of the interventions, population, and study design (including primary outcomes and sample size calculations) were extracted.Information describing the methods and assumptions made during the sample size calculations (i.e., anticipated effect size) were extracted and were assumed (unless otherwise stated) to have been calculated a priori.Where available, the followup point the sample size calculation was based on was extracted.
In addition, data relating to the primary outcomes of studies were extracted to provide details of the reported effectiveness of trials.The primary outcomes of included studies were determined based on the outcome measure authors described as the primary or main outcome of the study or the outcome included in the sample size calculation.Where this was not specified, information within referenced protocols and trial registries was used to clarify the primary outcome of the trial.Primary outcome data were extracted based on the primary outcome follow-up point.This was determined based on the timepoint authors described as the primary or main follow-up point.
Where authors did not clearly specify the primary outcome follow-up point, an assumption was made that if follow-up data were only reported at one time point, this was the primary outcome follow-up point.Where multiple follow-up points were reported, information from protocol papers and trial registries were used to identify the primary outcome follow-up point.Where this was not available or did not align with the reporting in the paper, the longest follow-up point was assumed as the primary outcome follow-up point.
To support with the presentation of findings from each trial and the conducting of meta-analysis, missing data were sourced directly from authors where possible.Where baseline and follow-up data for intervention and control arms or between-group differences were not reported, this was requested directly from the authors.Studies with missing data, which were also not provided by authors, or studies that only reported outcomes by subgroups were unable to be included in the meta-analysis, and missing data are highlighted in Table 1.

| Quality assessment
The quality of included studies was assessed using the Cochrane recommended Risk of Bias 2 (RoB2) tool. 106This quality appraisal tool is an updated version of the previous risk of bias tool, 107 which now provides separate guidance for appraisal of individually randomized and cluster randomized controlled studies.Each domain was scored either "low risk," "high risk," or "some concerns."When assessing the domain of bias due to missing outcome data, if at least 95% of participants that were randomized were followed up, this was defined as "nearly all participants within clusters."Quality appraisal was initially conducted in 50% of papers by the first (LP) and a second reviewer (ES, WB, HS, or LM) with disagreements resolved through discussion with a third reviewer (MB).The first reviewer conducted the remaining 50% of papers following assurance that the quality appraisal tool was being applied consistently.

| Assessing the risk of type 1 error
All included studies were assessed for the risk of type 1 error.For the purpose of this, we applied predefined criteria including, (1) whether a protocol or trial registry was referenced and provided detail to confirm the reported primary outcome and follow-up point were predetermined and (2) whether the predetermined primary outcome and follow-up point were reported as the main outcome 108 (i.e., the primary outcome is clearly reported and discussed as the main finding rather than the paper focusing on secondary outcomes that may have had more effect).These criteria were agreed upon by two reviewers (MB and LP) and applied by one reviewer (LP) with discussion with a second reviewer (MB) when support with final decisions was needed.
Studies were required to meet both predefined criteria in order to be classified as having a low risk of type 1 error.Otherwise, they were defined as high-risk, though information about whether risk was based on not meeting one or both criteria was collated.

| Narrative synthesis
A narrative review was conducted to explore the characteristics of included studies.Assumptions made in sample size calculations, including the anticipated intraclass correlation coefficient (ICC), effect size (we also examined the justification of chosen effect sizes), loss to follow-up, and the required sample size, were reported.These findings are reported as ranges, with details of individual studies reported separately.

| Data synthesis
Meta-analyses were conducted to explore the overall effectiveness of child obesity prevention interventions, in addition to exploring effectiveness according to risk of type 1 error.
A minimum of two studies per analysis were required for a metaanalysis to be conducted.Studies were required to provide participant follow-up numbers, mean differences per arm, and standard deviations (or data necessary to calculate standard deviations) to be included in a meta-analysis.Studies that involved cluster randomization were eligible for inclusion if it was clearly stated that outcomes were adjusted for clustering (see Table 1), or the analysis plan stated that analyses were conducted to account for clustering.Separate meta-analyses were conducted for studies of children aged 0-5 years, children of primary school age (6-11 years), and children of secondary school age (12-18 years).
An analysis was conducted within each age category for both BMI z-score and BMI outcomes combining all intervention designs and primary outcome follow-up points.Meta-analyses were also conducted to compare the effectiveness of studies deemed high or low risk of type 1 error according to the criteria outlined above.When studies had multiple intervention arms that were included in the same analyses, the number of control participants was split evenly across the intervention arms so as not to duplicate participants.Exploratory subgroup analyses were also conducted to exclude studies with recruited samples under 200.Additional subgroup analyses were also conducted to explore the effect of excluding studies that were published before the year 2000 where CONSORT was less likely to have been followed (because of the first CONSORT being published in 1996) 109 and trials less likely to have been pre-registered.
The generic inverse variance method by random effect was conducted using Revman 4.2. 110This method was chosen as it allows for the inclusion of studies reporting only the difference between arms as well as studies reporting the mean change from baseline per arm in meta-analyses.When available, adjusted mean data were included in meta-analysis; otherwise, unadjusted data were used.
The quality of evidence provided for each meta-analysis was evaluated using the GRADE toolkit (Grading of Recommendations Assessment, Development, and Evaluation). 111Each analysis was ranked either very low, low, moderate, or high quality based on limitations of study design, inconsistency of results, indirectness of evidence, imprecision, and publication bias.
Limitations in the study design of the reviewed papers were evaluated using the RoB2 tools, with a particular focus on biases due to blinding, loss to follow-up, selective reporting, and bias during recruitment in cluster randomized trials.For the purpose of assessing inconsistency, the I2 heterogeneity score calculated by Revman was assessed, with results of 40%-60% heterogeneity having moderate inconsistency and any analyses over 60% having substantial inconsistency. 112Publication bias was assessed through visually assessing the asymmetry of funnel plots generated for each analysis through Revman 4.2.

| RESULTS
The initial database search (January 2019) retrieved 20,616 articles with an additional five sourced through citation searches (Figure 1 18 ).

| Study characteristics
The majority of included studies were cluster randomized (N = 72/84) with the most common level of randomization being schools (N = 56).

| Risk of bias of included studies
Figure 2 reports the overall risk of bias in included studies, with 12 studies assessed using the individually randomized risk of bias tool and 72 assessed using the cluster randomized tool."Missing information" was a common reason for studies receiving judgments of "some concerns" for multiple domains.All domains of bias had more studies assessed to be of low risk of bias rather than high risk of bias; however, the overall quality of the majority of studies was reduced because of the large number of domains being labeled as "some concerns" by the tool.Only two studies 25,49 received scores indicating that they were at low risk for all domains, though seven studies were low risk for all but one domain indicating "some concerns." 28,40,42,43,50,52,77The risk of bias of each study by domain can be viewed in Data S3.

| Risk of type 1 error
Of the 84 studies, 20 studies met both criteria and were considered as low risk of type I error. 22,28,31,32,34,37,38,40,45,49,50,52,69,74,83,94,95,97,99,101e most common criterion that studies did not meet that contributed to them being classified as high risk of type 1 error was not including a reference to a protocol or trial registry, making it unclear if reported findings were based on a predetermined primary outcome (N = 35).A further 10 studies did not provide details of the primary outcome timepoint, and one study 86 did not provide details of the primary outcome or timepoint on the referenced trial register.Fifteen studies were classified as high risk because of the primary outcome (n = 6) or timepoint (n = 9) reported in the paper not matching the prespecified primary outcome and timepoint on the referenced trial registry.Four studies were at high risk of type 1 error because of the prespecified primary outcome measure not being reported as the main outcome of the study.The risk of type 1 error of each study is reported in Table 1.

| Assumptions used to develop study sample size calculations
Studies reporting a sample size calculation applied a range of assumptions and are presented according to the different age categories (i.e., children aged 0-5 years, primary school-aged children, and secondary school-aged children) in Table 1.The most commonly reported primary outcomes across studies of all population age groups were BMI (N = 30) and BMI z-score (N = 15).Few studies (N = 15) reported the specific follow-up point their sample size calculation was based on.Information on the follow-up point considered within sample size calculations are included in Table 1.
The detectable effect size differences used in sample size calculations in trials of children aged 0-5 years ranged from 0.25 27 (no specified timepoint) to 0.67 26 (at 3-year follow-up) BMI z-score and BMI ranging from 0.25 kg/m 234 (at 2 years follow-up) to 0.35 kg/m 233 (no timepoint specified).For trials including primary school-aged children, the detectable differences used in sample size calculations ranged from 0.1 51 (no timepoint specified) to 0.5 52 (no timepoint specified) BMI z-score and BMI ranged from 0.1 kg/m 258 (at 1 year) to 1.1 kg/m 274 (no timepoint specified).Only one trial that included secondary school-aged children considered BMI z-score in their sample size calculations, and they considered a detectable difference of 0.4 93 (no timepoint specified).Detectable difference in BMI ranged from 0.2 effect size 96 (no timepoint specified) to a difference of 1.0 BMI unit 100 (no timepoint specified).
Twenty-one studies provided justification of the expected difference (used to develop effect sizes) specified in their sample size calculations.The most commonly used data used to estimate a detectable difference came from previous studies (N = 8) 38,52,62,72,90 and pilot studies. 65,100,103Some authors used data from their own previously conducted studies, 41 or from outcome data collected at earlier stages of their trial. 37,60In addition, two studies stated their detectable difference in their sample size calculation was based on data sets from national databases.Two studies based their expected difference on the difference between growth chart major percentile lines. 23,31Five studies stated they used "clinically important differences."Two of these studies referenced childhood obesity treatment intervention studies rather than prevention studies and both stated clinically important differences of 0.25 BMI z-score. 42,43The remaining three studies did not provide a reference or explanation for what they stated was clinically meaningful difference used in their sample size calculation and stated a clinically important difference of 0.1, 0.75, and 0.5 kg/m 2 . 46,77,78 the 51 studies reporting the sample size α significance level, all but one study used an assumption of a p level of 0.05.Of the 57 studies that reported the sample size β power, 41 based their sample size calculation on 80% power.The most common estimated dropout rate was 20% (N = 8), and the range (N = 21) was 10%-30%.Twenty-two of the 75 cluster randomized trials reported assumed intraclass correlation coefficient that ranged from 0.001 63,87 to 0.15, 65 and the most commonly used ICC was 0.05 25,66,79 or 0.01 30,46,59 Four studies used an ICC based on research in a similar setting 49,68,78 or with a similar population. 72

| Meta-analyses
Details and outcomes for individual study's primary outcome and meta-analysis can be found in Tables 1 and 2, respectively.Forest The overall difference between intervention and control of studies examining BMI z-score of children aged 0 to 5 years reporting BMI z-score as a primary outcome was À0.00 (CI À0.05, 0.05).Of these, only one study with two intervention arms had a low risk of type 1 error 22 and had a pooled mean difference of 0.12 (CI À0.05, 0.28).

| BMI effect size in children aged 0 to 5 years
Five studies that were eligible for meta-analysis reported BMI as the primary outcome and had a combined mean difference of F I G U R E 2 Risk of bias of included studies by domain.
T A B L E 2 Summary of findings.(CI À0.29, 0.07).Three of these studies 25,31,32 were at low risk of type 1 error, with a combined mean difference of À0.12 (À0.41, 0.17) compared with two studies 33,35  3.8 | BMI z-score effect size in primary schoolaged children (6-11 years)   Studies of primary school-aged children that reported BMI z-score as a primary outcome had a combined mean difference of À0.04 (CI À0.06, À0.03).Subgroup analysis was conducted based on intervention design.
Interventions that included both a nutrition and physical activity component (n = 4) had a combined mean difference of À0.10 (CI À0.17, À0.3) compared with the physical activity-only intervention that had a mean difference between intervention and control arm of À0.03 (CI À0.08, 0.02) 43 and the nutrition education only intervention that had a mean difference of À0.18 (CI À0.31, À0.04). 52ree studies 49,50,52 that reported BMI z-score as a primary outcome within 6-11-year-olds had a low risk of type 1 error and had a combined mean difference of À0.10 (CI À0.19, À0.01) compared with six studies that had a high risk of type 1 error and had a mean difference of À0.04 (CI À0.06, À0.02).
3.9 | BMI effect size in primary school-aged children (6-11 years) Studies of primary school-aged participants that reported BMI as the primary outcome reported a combined mean difference of À0.16 (CI À0.27, À0.05)Only one study reporting BMI as a primary outcome had a low risk of type 1 error with a mean difference of À0.07 (CI À0.19, 0.05) 69 compared with 13 studies that had a combined mean difference of À0.17 (À0.29,À0.05) and had a high risk of type 1 error.
3.10 | BMI z-score effect size in secondary schoolaged children (12-18 years)   No studies of secondary school-aged children that were eligible for inclusion in the meta-analysis reported BMI z-score as a primary outcome.A large range in the assumptions have been used to develop sample size calculations, including the predicted effect sizes.The variability in predicted effect size within sample size calculations could in part be attributed to logical differences based on intervention design, follow-up duration, and/or participant age; all of which could influence the predicted reduction in BMI or BMI z-score. 9One difficulty faced was a high level of uncertainty regarding the amount of change that would constitute a "meaningful change" in child obesity prevention trials.There was a limited justification of the authors' primary outcome measure; however, where detail was provided, authors often reported using data from previous trials considered to have generated a "successful" outcome to guide sample size calculations.In some studies, these appeared to be based on the size of statistically significant differences rather than clinical or meaningful significance, and the implications for obesity prevention were not discussed.
The review has highlighted that many studies previously conducted in the field may be at risk of type 1 error.However, rather than consistently observing a greater effect in those at greatest risk of type 1 error, our analyses for primary school-aged studies with BMI z-score and secondary school-aged studies with BMI as primary outcomes identified larger effect sizes in those at low risk of type I error.
Although there was no consistency in whether studies deemed high or low risk of type 1 error were reporting greater effectiveness across the different analyses, the analyses identified that when analyzing outcomes of studies that are high and low risk of type 1 error separately different results were generated.This suggests risk of type 1 error may have an impact on findings and should be considered both when interpreting study results and when using previous evidence to support future trial design.In obesity prevention research, it is hypothesized that "multiple small changes within a system can make a difference" to weight management. 126,127This hypothesis is plausible and is based on economic modeling. 17,128,129Although there is little debate that multiple changes are needed across the whole system to impact on obesity prevalence at a population level, it is not yet known how individual interventions contribute to prevention within the system.This is further complicated by our need to design evaluations that, rather than looking for a measurable impact of obesity reduction at an individual level (i.e. with treatment), seek to find smaller alterations to energy imbalance that over time reduce excess weight gain. 9,130,131

| Strengths and limitations of the study
The review included a broad search strategy that identified a large number of papers.Missing evidence was sought through referenced protocols and through contacting authors to ensure a maximum number of studies could be included in the meta-analyses and studies were appraised based on as much information as possible.However, only 40 of the retrieved studies were eligible for inclusion for the meta-analyses, either because of missing outcome data or not reporting BMI or BMI z-score as a primary outcome.Additionally, the criteria for determining the risk of type 1 error were based on assumed criteria that were not further explored or validated.For example, some studies classed as high risk for not providing a protocol or registration may have been published before CONSORT guidance in this area. 132rther, some studies simply had missing information on trial registration or did not reference a protocol, perhaps indicating a reporting error rather than a bias.Although a validated tool was not used to categorize studies at high or low risk of type 1 error, our approach has allowed an exploration and comparison of studies considered most and least likely to be at risk of type 1 error.
The confidence of findings from all meta-analyses was assessed to be either low or very low.This suggests that findings should be interpreted with caution.The large amount of missing information to assess the risk of bias of individual studies included in the metaanalyses was a common reason for the downgrading of the quality of evidence.The updated version of the RoB2 tool used in the review required more detail on study design to be reported than the previous version, and as some studies included in the review were published before CONSORT guidelines were available, some studies did not report information required to support decision making.

| CONCLUSION
This review has found there is broad variation in the design of child obesity prevention trials and that the effectiveness of obesity prevention We systematically searched databases including Medline, PsycInfo and Embase (Ovid), CINAHL, Web of Science, Scopus, and the Cochrane Library.The first search was conducted in January 2019 with searches including articles published from any date.Additional updated searches were conducted in February 2020 and January 2021 to identify any new articles published within the previous 12 months.Citations within relevant systematic reviews identified through the search were explored for any additional relevant references.Protocol papers and trial registries referenced in eligible articles were searched to identify any missing information not reported.
Subgroup analyses were conducted to explore outcomes by follow-up duration (i.e., 6-11 months and subsequent yearly intervals) and intervention type (i.e., nutrition interventions, physical activity interventions, and nutrition and physical activity interventions) when at least two studies had the same intervention design within an analysis.Intervention categories were decided based on the most common T A B L E 1 Sample size details and primary outcomes.
primary outcome: prevalence of overweight and obesity Dodd, 201837 L E 1 (Continued)Primary school age studies (aged 6-11 years)

Foster
Comprehensive intervention: 3476 Nutrition education: 628 Physical activity: 605 No comprehensive intervention: 3398 No nutrition education: 466 No physical activity: 466 1 year Studies with the primary outcome: Prevalence of overweight and obesity Studies with the primary outcome: incidence of overweight and obesity age in years, Latino race/ ethnicity, and US-born status), and NSLP eligibility.during data extraction of included studies.
the included studies (N = 47) examined an intervention based within a school setting with a further 15 studies having interventions based in the school and home.The remaining studies were based in the home, community settings, early years settings, and maternity settings.Nutrition education (N = 65) and physical activity (N = 65) were the most common components of interventions, with less common components focused on parenting, sleep routines, and food environments.Almost all studies examined both male and female children (N = 81) with two studies recruiting females only

3 . 6 |
figure, subgroup analyses by follow-up point (i.e., 6-11 months, 12-23 months, and 24-36 months) are also presented.All meta-analyses were scored at either low or very low quality based on the GRADE quality assessment.The most common reasons were "risk of biases in individual studies" and "differences in follow up time and intervention designs of combined studies."Potential publication bias was detected in the analyses of the overall effectiveness of interventions aimed at children of primary school age with BMI outcome as a primary outcome and primary school age interventions with BMI as primary outcome deemed high risk of type 1 error.
Direct comparisons cannot be made with outcomes of previous reviews because of differences in eligibility of included studies and how populations and interventions have been categorized and also because of no previous meta-analyses having explored studies with high or low risk of type 1 error.However, findings of the overall effectiveness of obesity prevention interventions within this systematic review appear to have commonalities with recent reviews, generally showing small reductions in both BMI and BMI z-score in favor of the intervention.For example, the effects demonstrated in this review were À0.10 for BMI z-score and À0.55 for BMI (for combined diet and physical activity interventions in primary school-aged children) and À0.15 for BMI (for secondary school-aged children).Other similar recent meta-analysis had BMI z-score effect sizes ranging from À0.02 to À0.20 and À0.05 to À1.53 for BMI. 9,10,125Although this review found interventions to have bigger effects in older children, the most recent Cochrane review 9 found interventions to have a larger effect in children aged 0 to 5 years compared with primary school children.However, this could be explained by different studies being included in the analyses because of different eligibility criteria of the two reviews.
interventions is being determined according to a range of expected effect sizes.It has provided readers with details of how previous studies have designed obesity prevention trials, which, in the absence of a defined "clinically meaningful difference" in child obesity prevention, can provide guidance for future study design.The design of individual studies is reported alongside information of the study's quality in relation to its risk of bias and risk of type 1 error.Where new studies are designed based on outcomes of previous RCTs, this review suggests that study quality and risk of type 1 error should be considered to ensure the sample size is based around a realistic outcome that has not been over or underestimated because of errors in trial conduct.We also provide an update on the overall effectiveness of childhood obesity prevention interventions including the most recently published studies, highlighting greater BMI differences when interventions combine diet and physical activity components.Further clarity is required to determine what a meaningful difference is in population prevention trials in order to support decision-making in trial design.