Family group decision‐making for children at risk of abuse or neglect: A systematic review

Abstract Background Capturing the scale of child maltreatment is difficult, but few would argue that it is anything less than a global problem which can affect victims’ health and well‐being throughout their life. Systems of detection, investigation and intervention for maltreated children are the subject of continued review and debate. Objectives To assess the effectiveness of the formal use of family group decision‐making (FGDM) in terms of child safety, permanence (of child's living situation), child and family well‐being, and client satisfaction with the decision‐making process. Search Methods Both published and unpublished manuscripts were considered eligible for this review. Library staff from Scholarly Information (Brownless Biomedical Library) University of Melbourne, conducted 14 systematic bibliographic searches. Reviewers also checked the reference lists of all relevant articles obtained, and reference lists from previously published reviews. Researchers also hand‐searched 10 relevant journals. Selection Criteria Study samples of children and young people, aged 0–18 years, who have been the subject of a child maltreatment investigation, were eligible for this review. Studies had to have used random assignment to create treatment and control groups; or, parallel cohorts in which groups were assessed at the same point in time. Any form of FGDM, used in the course of a child maltreatment investigation or service, was considered an eligible intervention if it involved: a concerted effort to convene family, extended family, and community members; and professionals; and involved a planned meeting with the intention of working collaboratively to develop a plan for the safety well‐being of children; with a focus on family‐centred decision‐making. Data Collection and Analysis Two review authors independently extracted the necessary data from each study report, using the software application Covidence. Covidence highlighted discrepancies between data extracted by separate reviewers, further analysis was conducted until a consensus was reached on what data were to be included in the review. Two authors also independently conducted analyses of study bias. Main Results Eighteen eligible study reports were found, providing findings from 15 studies, involving 18 study samples. Four were randomised controlled trials (RCTs; N = 941) the remainder employed quasi‐experimental designs with parallel cohorts. Three of the quasi‐experimental studies used prospective evaluations of nonrandomly assigned comparison groups (N = 4,368); the rest analysed pre‐existing survey data, child protection case files or court data (N = 91,786). The total number of children studied was 97,095. The longest postintervention follow‐up period was 3 years. Only four studies were conducted outside the United States; two in Canada and one in Sweden and one in the Netherlands. The review authors judged there to be a moderate or high risk of bias, in most of the bias categories considered. Only one study referenced a study protocol. Eleven of the fifteen studies were found to have a high likelihood of selection bias (73%). Baseline imbalance bias was deemed to be unlikely in just two studies, and highly likely in nine (60%). Confounding variables were judged to be highly likely in four studies (27%), and contamination bias was judged highly likely in five studies (33%). Researcher allegiance was rated as a high risk in three studies (20%) where the authors argued for the benefits of FGDM within the article, but without supporting references to an appropriate evidence base. Bias from differential diagnostic activity, and funding source bias, were less evident across the evidence reviewed. This review combines findings for eight FGDM outcome measures. Findings from RCTs were available for four outcomes, but none of these, combined in meta‐analysis or otherwise, were statistically significant. Combining findings from the quasi‐experimental studies provided one statistically significant finding, for the reunification of families, favouring FGDM. Ten effect sizes, from nine quasi‐experimental studies, were synthesised to examine effects on the reunification of children with their family or the effect on maintaining in‐home care; in short, the effect FGDM has on keeping families together. There was a high level of heterogeneity between the studies (I 2 = 92%). The overall effect, based on the combination of these studies was positive, small, but statistically significant: odds ratio (OR), 1.69 (confidence interval [CI], 1.03, 2.78). Holinshead's (2017) RCT also measured the maintenance on in‐home care and reported a similar result: OR, 1.54 (CI, −0.19, 0.66) not statistically significant. The overall effect for continued maltreatment from meta‐analysis of five quasi‐experimental studies, favoured the FGDM group, but was not statistically significant: OR, 0.73 (CI, 0.48, 1.11). The overall combined effect for continued maltreatment, reported in RCTs, favoured the control group. But it was not statistically significant: OR, 1.29 (CI, 0.85, 1.98). Five effect sizes, from nonrandomised studies, were synthesised to examine the effect of FGDM on the number of kinship placements. The overall positive effect based on the combination of these studies was negligible: OR, 1.31 (CI, 0.94, 1.82). Meta‐analysis was not possible with other outcomes. FGDM's role in expediting case processing and case closures was investigated in six studies, three of which reported findings favouring FGDM, and three which favoured the comparison group. Children's placement stability was reported in two studies: an RCT's findings favoured the control, while a quasi‐experimental study's findings favoured FGDM. Three studies reported findings for service user satisfaction: one had only 30 participants, one reported a statistically significant positive effect for FGDM, the other found no difference between FGDM and a control. Engagement with support services was reported in two studies; neither reported statistically significant findings. Authors' Conclusions The current evidence base, in this field, is insufficient to draw conclusions about the effectiveness of FGDM. These models of child protection decision‐making may help bring about better outcomes for children at risk, or they may increase the risk of further maltreatment. Further research of rigour, designed to avoid the potential biases of previous evaluations, is needed.

(RCTs; N = 941) the remainder employed quasi-experimental designs with parallel cohorts. Three of the quasi-experimental studies used prospective evaluations of nonrandomly assigned comparison groups (N = 4,368); the rest analysed pre-existing survey data, child protection case files or court data (N = 91,786). The total number of children studied was 97,095. The longest postintervention follow-up period was 3 years. Only four studies were conducted outside the United States; two in Canada and one in Sweden and one in the Netherlands. The review authors judged there to be a moderate or high risk of bias, in most of the bias categories considered. Only one study referenced a study protocol. Eleven of the fifteen studies were found to have a high likelihood of selection bias (73%). Baseline imbalance bias was deemed to be unlikely in just two studies, and highly likely in nine (60%). Confounding variables were judged to be highly likely in four studies (27%), and contamination bias was judged highly likely in five studies (33%). Researcher allegiance was rated as a high risk in three studies (20%) where the authors argued for the benefits of FGDM within the article, but without supporting references to an appropriate evidence base. Bias from differential diagnostic activity, and funding source bias, were less evident across the evidence reviewed. This review combines findings for eight FGDM outcome measures. Findings from RCTs were available for four outcomes, but none of these, combined in meta-analysis or otherwise, were statistically significant. Combining findings from the quasi-experimental studies provided one statistically significant finding, for the reunification of families, favouring FGDM. Ten effect sizes, from nine quasi-experimental studies, were synthesised to examine effects on the reunification of children with their family or the effect on maintaining in-home care; in short, the effect FGDM has on keeping families together.
There was a high level of heterogeneity between the studies (I 2 = 92%). The overall effect, based on the combination of these studies was positive, small, but statistically significant: odds ratio (OR), 1.69 (confidence interval [CI], 1.03, 2.78). Holinshead's (2017) RCT also measured the maintenance on in-home care and reported a similar result: OR, 1.54 (CI, −0.19, 0.66) not statistically significant. The overall effect for continued maltreatment from meta-analysis of five quasi-experimental studies, favoured the FGDM group, but was not statistically significant: OR, 0.73 (CI, 0.48, 1.11).
The overall combined effect for continued maltreatment, reported in RCTs, favoured the control group. But it was not statistically significant: OR, 1.29 (CI, 0.85, 1.98). Five effect sizes, from nonrandomised studies, were synthesised to examine the effect of FGDM on the number of kinship placements. The overall positive effect based on the combination of these studies was negligible: OR, 1.31 (CI, 0.94, 1.82). Meta-analysis was not possible with other outcomes. FGDM's role in expediting case processing and case closures was investigated in six studies, three of which reported findings favouring FGDM, and three which favoured the comparison group. Children's placement stability was reported in two studies: an RCT's findings favoured the control, while a quasiexperimental study's findings favoured FGDM. Three studies reported findings for service user satisfaction: one had only 30 participants, one reported a statistically significant positive effect for FGDM, the other found no difference between FGDM and a control. Engagement with support services was reported in two studies; neither reported statistically significant findings.
Authors' Conclusions: The current evidence base, in this field, is insufficient to draw conclusions about the effectiveness of FGDM. These models of child protection decision-making may help bring about better outcomes for children at risk, or they may increase the risk of further maltreatment. Further research of rigour, designed to avoid the potential biases of previous evaluations, is needed.
1 | PLAIN LANGUAGE SUMMARY 1.1 | No evidence that family group decision-making is better, or worse, than conventional child protection procedures Family group decision-making is used to make decisions about how best to protect children, and support families. It engages the family, extended family, and people in the community around the family, in these decisions.
It features an independent meeting facilitator, private family time away from professionals and the prioritisation of family plans. This review finds that the evidence base supporting this approach is of poor quality with no clear finding that it is any better or worse than conventional approaches.

| What is this review about?
Child maltreatment is a global problem which can affect victims' health and well-being throughout their life. Debate continues as to effective systems of detection, investigation and intervention for maltreated children.
This review assesses the effectiveness of the formal use of family group decision-making in terms of child safety, permanence (of child's living situation), child and family well-being, and client satisfaction with the decision-making process.
What is the aim of this review?
This Campbell systematic review assesses the effectiveness of family group decision-making to tackle child abuse. It summarises the evidence from 15 studies is four countries, with most studies being from the USA.

| What studies are included?
The included studies were about children and young people, aged 0-18 years, who had been the subject of a child maltreatment investigation.
Studies had to have used random assignment to create treatment and control groups; or, parallel cohorts, in which groups were assessed at the same point in time. Any form of family group decision-making used in the course of a child maltreatment investigation or service was considered an eligible intervention if it involved: a concerted effort to convene family, extended family and community members; and professionals; and involved a meeting with the intention of working collaboratively to develop a plan for the safety and well-being of children; with a focus on family-centred decision-making.
The review authors found 18 eligible study reports, providing findings from 15 studies, involving 18 study samples. Four of the studies were randomised controlled trials.
All but four studies were conducted outside the USA: two in Canada, one in Sweden and one in The Netherlands.

| What are the findings of this review?
Overall, there are few if any significant benefits of family group decision-making compared to conventional treatment, and the quality of the studies in the evidence base is generally poor.
Four randomised controlled trials found no significant effect on continued maltreatment, reunification of children with families or maintenance of in-home care, engagement with support services and social support.
The quasi-experimental studies found a statistically significant finding favouring family group decision-making for the reunification of families, but not for any other outcomes. In all cases, there is considerable variation in effects between studies.
2. An independent (i.e., noncase carrying) coordinator chairs one or more meetings of family members and child protection service staff.
3. Family groups are given time to themselves, during, after or between meetings, to help facilitate their own decision-making and, where appropriate, agreement on a safety plan going forward. 4. Child protection services prioritise family group plans, providing child protection concerns are adequately addressed.
It should be noted that FGDM interventions are an on-going active process, during the course of a child protection case. Plans agreed by the family group and child protection staff are reviewed at appropriate intervals. They also hinge on child protection agencies providing appropriate support, in line with the plan agreed by the family group, on an on-going basis. However, these criteria are equally applicable to most non-FGDM practice models in the child protection domain, and were not used as selection criteria in this review.
Studies which compared FGDM with more traditional child protection services were included in this review. Traditional child protection services are defined here as those in which decision-making on children's care plans and placement have been professionally driven, with workers conducting assessments of families' problems and risk profiles, and determining a care plan with which families are asked to comply (Merkel-Holguin, 2003;Rockhill 1999). Policy and practice guidance in most developed nations now acknowledges the importance of in-depth engagement with children's families and extended families where possible (Connolly, 2006;Littell, 2001;Yatchmenoff Diane, 2005). However, FGDM remains distinct from traditional services. None of the comparators in the primary studies reviewed here employed independent, FGDMtrained chairs, for meetings with private family time and a prioritisation of family-proposed plans.

| How the intervention might work
We can explore the hypothesised mechanisms of FGDM in more detail.
First, in relation to family empowerment: studies which have researched family's perspectives on FGDM have found that they prefer it to traditional child protection practice (Berzin, 2007). This may be the case because it offers a better balance of power to families, or it may be due to increased family unity, brought about by the FGDM process .  suggests that families are more likely to accept and buy into a plan that they themselves have proposed, than a plan imposed upon them by professionals.
Second, FGDM is designed to seek out and encourage the participation of extended family and community resources. In this way, FGDM models aim to strengthen the family and community network. Creating a strong network of support around the child and caregiver(s) may improve outcomes for children. Attracting investment from extended family through FGDM is thought to increase the likelihood of a kinship placement, when children must move from their home. The involvement of extended family is also thought to increase the likelihood that, when placed, children will remain with their siblings (Connolly & MacKenzie, 1998; C. Lupton & Nixon, 1999;Marsh & Crow, 1998).
Third, FGDM models frame families as competent and often explicitly focus on their strengths, with the aim of empowering families and shifting their experience of child protection service from one characterised by powerlessness to one of self-determination and collaboration (C. Lupton & Nixon, 1999). Literature across disciplines indicates that therapeutic settings which support clients' sense of autonomy, relatedness and competence are more likely to bring about compliance with treatment, and greater transfer and maintenance of treatment gains (Dwyer, Hornsey, Smith, Oei, & Dingle, 2011;Ryan, Lynch, Vansteenkiste, & Deci, 2011).
Finally, on how FGDM might work to improve outcomes for maltreated children we should also reference FGDM's theoretical underpinning: FGDM could be said to align with ecological system theory, social network theory and strengths-based therapeutic practice and intervention (Havnen & Christiansen, 2014;Nyberg, 2003). Through concerted engagement with families, FGDM is thought to encourage a more comprehensive trawl of the systems within which a child at risk exists, and is therefore more likely to find intrinsic family strengths which can lead to better outcomes.

| Why it is important to do this review
This review contributes to the literature by including the most recent research on FGDM, including outcomes that have not been included in prior reviews, and employing stringent criteria for search, selection, coding, and analysis as specified in the Campbell Collaboration guidelines (Campbell Collaboration, 2016). The question of how effective FGDM is in meeting its objectives has attracted considerable commentary. While commentary on the implementation and success of FGDM is extensive, relatively few studies of efficacy have been conducted .
The current evidence base is routinely cited as positive by researchers studying FGDM, together with reviewers, and commentators on the topic area (including but not limited to : Baumann, 2006;Burford, 1999;Pennell, Edwards, & Burford, 2010;Sheets et al., 2009).
In 2003, the American Humane Society (2019) published a special issue on FGDM, with 29 submissions from the United States and beyond (forwarded by Merkel-Holguin, 2003). Much of the material brought together in this special issue relates to the implementation of FGDM projects. Only one of these articles provided sufficient outcome data to facilitate inclusion in the current review. However, Merkel-Holguin et al. (2003) provided the forward to this volume and summarised that FGDM compared favourably to traditional child protection methods in providing child safety; encouraging kinship placements; encouraging stable children's placements; bringing about reunification with parents, timely decision-making, increased family support and a reduction in family violence. Crampton (2006) provided a narrative review of four FGDM evaluations. Primary study effect sizes are not offered. Crampton describes the positive evaluations of two studies  and the inconclusive findings of Sundell and Vinnerljing's (2004)     also completed a narrative review and reported mixed results, while also offering encouragement for the continued practice of FGDM. Frost et al. suggest that studies by Crampton and Jackson (2007); and  provide positive results; whilst  and   It can be seen that policy makers may find encouragement to deploy FGDM, in these reviews. However, a counter-standpoint also exists. A number of researchers in this field have argued that the body of evidence supporting FGDM lacks rigour, and that there is insufficient evidence available to make a judgement on the efficacy of FGDM (e.g., Creemers et al., 2016;Havnen & Christiansen, 2014;). Havnen and Christiansen (2014) found that seven out of ten studies, retrieved for their review, reported positive results. The other three were negative or neutral, and only two of the studies that reported positive results ) used satisfactory methods. Dijkstra et al. (2016) review is arguably the most comprehensive and rigorous review to date. Dijkstra et al. reviewed 14 studies and concluded that, according to the evidence available, FGDM did not significantly reduce child maltreatment. They highlighted the need for more robust studies of efficacy. Dijkstra et al. (2016) review has been a step forward in review methodology for the field. Previous reviews did not report systematic literature searches and did not review all of the studies available, or deploy meta-analysis. However, three studies were included in Dijkstra et al.'s review did not meet the selection criteria for the current review, and an additional three eligible studies were found for the current review.
More generally, there remains a lack of emphasis on study rigour, or a formal assessment of potential bias across FGDM evaluations. In the context of disagreement about FGDM efficacy, a systematic review, completed according to the Campbell Collaboration's standard of methodological rigour (Campbell Collaboration, 2016) provides a more definitive answer to the question. In addition, the acceptance, rejection and discussion of study methodologies, a central focus of Campbell reviews, will provide guidance for the development of more rigorous study protocols, going forward.
In summary, this review considers the problem of how to go about optimum decision-making for the protection of children from abuse and neglect. This problem is located within on-going efforts to protect children while also promoting family unity, upholding family's rights and guarding against oppressive statutory intervention in family life. FGDM has been proposed as an effective response to this problem, and this review will help guide the development of this intervention and its evaluation.

| OBJECTIVES
To assess the effectiveness of the formal use of FGDM in terms of child safety, permanence (of child's living situation), child and family well-being and client satisfaction with the decision-making process. 4 | METHODS 4.1 | Criteria for considering studies for this review 4.1.1 | Types of studies Studies will be eligible for this review if they (a) used random assignment to create treatment and comparison or control groups; or (b) used parallel cohort designs in which groups were assessed at the same points in time (i.e., quasi-experimental designs that include groups assessed at the same time as opposed to a historical cohort).
Single-group designs and single-subject designs will be excluded (see "risk of bias" section for further details on included designs).

| Types of participants
Children and young people aged 0-18 years who have been the subject of a child maltreatment investigation.

| Types of interventions
Any form of FGDM used in the course of a child maltreatment investigation or during the course of services arising from such an investigation. FGDM involves convening family and child protection professionals with one or more of other professionals, extended family, identified friends and/or community members. In an effort to collaboratively develop a plan to maintain child safety, facilitate stable and permanent living arrangements, and promote child well-being. Therefore, studies will be included in the review if they involve: (a) a concerted effort to convene family, including extended family, friends and community members; and (b) child protection professionals (as well as other professional service providers) participating in; (c) one or more planned meetings with the intention of working collaboratively to develop a plan for the safety, permanence and well-being of children; (d) with a focus on familycentred decision-making; (e) an independent meeting facilitator; (f) private family time during the process.

Primary outcomes
Official reports found in administrative data and case files, were the preferred indicators of outcomes, but studies were also accepted if they used standardised recording tools for study participant reports.
The prevention of child maltreatment, and the stability of child placements following the involvement of a child protection service, were the primary outcomes of interest. The success of FGDM, in preventing child maltreatment, was measured by (in order of preference): substantiated or verified referrals to a child protection authority; referrals (with or without substantiation) to a child protection authority; parent-report; and child self-report. Indicators of child placement stability differed depending on the childrens' circumstances. If children resided in the homes of their permanent carers, then a move to an out-of-home placement was a negative outcome. Therefore, more child removals, in comparison with a non-FGDM group of children, were indicative of poor efficacy. Kinship placements (placement in out-of-home care with relatives) were interpreted as a positive outcome in comparison to other out-of-home placements (e.g., residential care). The achievement of legal permanence, for childrens' placements was accepted as a positive outcome. For example, reunification with birth parents, adoption by related or nonrelated caregivers, placement with relative caregivers, legal guardianship/legal custody by related or nonrelated caregivers.
Studies were only included, in the analysis of primary outcomes, if subjects were followed for at least 6 months after the intervention; to allow for sufficient time to observe outcomes. Where outcomes were reported at multiple time points the longest follow-up period was used in the data synthesis.

Secondary outcomes
Secondary outcomes included child well-being, and client satisfaction with the FGDM process and plan. Data were not excluded on the basis of the validity or reliability of any instruments used. However, the reviewers judgements on the validity and reliability of instruments used formed the basis of their judgement on potential bias due to insensitive measurement instruments.

| Search methods for identification of studies
The primary systematic literature search was carried out in July 2016 by library staff, Tania Celeste and Frances Morrissey, from Scholarly Information, University of Melbourne. As this review was in process for 3 years the searches were repeated in August 2019 by the first and second author, date limited from 2016 to 2019. Both published and unpublished were considered eligible for the review. Searches were not restricted to any single language or nationality. One article required translation from Dutch to English, this was completed using Google translate.

| Electronic searches
Electronic searches for the identification of appropriate studies were completed as follows: The searches were broadly and substantively similar but leveraged controlled vocabularies and search operators unique to each resource.
For example, the construction "random* control* trial" could not be used in ProQuest as the internal wildcards were not recognised. Search facilities were chosen with reference to recent research (McGinn, Taylor, McColgan, & McQuilkan, 2014) on their comparative usefulness for questions related to social work. The search terms, formulae and syntax used on each search facility are described in Appendix C.

| Searching other resources
Reviewers checked the reference lists of all relevant articles obtained, and reference lists from previously published reviews. Authors of papers which could potentially have been included in the review, had they reported more details of findings, were emailed. The review team also searched ClinicalTrials.gov and the World Health Organisation's International Clinical Trials Registry Platform.
The following journals were hand-searched (online) by the review team: (1) Child Welfare (2) Children and Youth Services Review  Personal communications were also deployed in the search for relevant articles, as described in the review protocol (Shlonsky et al., 2009) these comprised of face-to-face discussions with presenters and emails to experts, and relevant study authors.

| Selection of studies
The search outputs, titles and abstracts for 1,576 papers, were uploaded to the software application Covidence. Covidence facilitated the screening and categorisation of the search outputs. Each article title was independently screened by two reviewers. Authors accessed manuscript abstracts, and whole texts where necessary. Covidence facilitates the screening process with a clear audit trail. After duplicates were removed initial screening, by two authors, excluded 1,419 manuscripts. The initial screening questions were: is the population of children and youth who are, or have been, the subject of child protection investigation?; and, is there an intervention related to family group conferencing in the study? Following initial screening, the full text of 100 articles were then independently assessed by two authors against the inclusion and exclusion criteria outlined in the study protocol (Shlonsky et al., 2009). At this level of screening, studies had to satisfy the following criteria: the study evaluated an intervention administered to children and youth aged 0-18; it used an experimental or parallel cohort research design, with a valid control or comparison group. The fundamentals of FGDM, as outlined in section (description of the intervention) were used to ensure study interventions were part of the FGDM family of interventions. Thirteen studies (reported in 15 manuscripts) from the main searches, were found to match selection criteria. Three additional study reports were located through correspondence with primary study authors. One of these provided additional findings for one of the studies located in the main searches; the other two were added to the primary studies for the review, following independent appraisal by two authors. In summary, 15 studies, reported in 18 study reports were selected for review.

| Data extraction and management
Two of three review authors, Tony McGinn, Mphatso Kamndaya and Admire Chereni, independently extracted the necessary data from each study report using Covidence. Covidence facilitates the recording of data for: (1) Study author(s); year of publication; source; country; and language.
(2) Characteristics of setting and participants: eligibility criteria for participants; explanation of recruitment procedures, setting (country, location, clinical/nonclinical); demographic features of the sample.
(3) Sampling: sample sizes for treatment and control; whether power analysis was used to determine sample size; allocation to the treatment and control; explanation of method used to generate the allocation.
(4) Research design: type of design including major features such as random selection, random assignment, and data relating to potential biases.
(5) Intervention data: the nature of the interventions (for treatment and comparison/control groups); FGC, FUM, or some other form of FGDM; aim of intervention; length of intervention, whether manuals were used, whether fidelity checks were included, information on possible contamination reported.
(6) Outcome data: primary and secondary outcomes, measures used, information on reliability/validity of measures.
(7) Results: attrition at postintervention and follow-up; number excluded from the analysis; length of follow-up; statistical methods; type of data effect size is based on; data needed for effect size calculations.
Covidence highlights discrepancies between data extracted by separate reviewers, and prompts further analysis of studies until a consensus can be reached on what data is to be included in the review. We considered "other biases" as listed in Cochrane guidance (Higgins et al., 2011):

| Assessment of risk of bias in included studies
In addition, under "other biases" we considered "researcher allegiance bias" and "funding source bias" for similar reasons as those outlined in Maynard, Solis, Miller, and Brendel (2017): studies are more likely to be biased in favour of the treatment intervention when study authors have a direct role in the development or the implementation of the study. We also considered "contamination bias" as this was highlighted as a possible bias in the review protocol (Shlonsky et al., 2009); we also considered potentially confounding variables, in the study environment (Sterne et al., 2016).
The review authors agreed on a priori guidance for the rating of bias in each primary study (see Appendix A). Each study was categorised as "low", "high", or "unclear" risk of bias on each of the domains. Extracts, from primary studies, which might underpin judgements on bias, were compiled and reviewed by two review authors.
Any discrepancies between review author judgements were resolved through discussion with a third member of the team.

Continuous data
A standardised mean difference (SMD) was calculated for studies reporting continuous data. A corrected Hedges' g was calculated by dividing the difference between group means by the pooled and weighted standard deviation (SD) of the groups. Specifically, Hedges' g corrects for a bias (overestimation) that occurs when the uncorrected standardised mean difference effect size is used on small samples. We computed a 95% confidence interval (CI) for each combined effect size to test for statistical significance; if the CI did not include zero, we rejected the null hypothesis that there is no difference between the group means.

Dichotomous data
We computed Mantel-Haenszel odds ratios (ORs) for the dichotomous outcome variables. Based on the assumption of proportional odds, ORs can be compared between variables with different distributions, including very rare and more frequent occurrences. Specifically, the odds of an event (e.g., children's reunification with their family) were calculated for each sample by dividing the number of children reunified, by the number of children who were not reunified with their family. We then calculated an OR by dividing the odds of reunificiation for the FGDM group by the odds of the non-FGDM group of children. In addition, we calculated and reported 95% CIs for the ORs reported.

| Unit of analysis issues
The unit of analysis for this review was children. There were no unit of analysis issues identified for the included studies.

| Dealing with missing data
Although studies with incomplete outcome data (e.g., missing means, SDs, sample sizes) were included in the review, they were excluded from the meta-analyses unless the review authors could calculate an effect size from the available information. When outcome data were missing from an article or report, we made reasonable attempts to retrieve these data from the original researchers. Evidence of attrition of study participants or data is described in the quality assessment of primary studies, reported in "Assessment of risk of bias in included studies" section.

| Assessment of heterogeneity
We assessed the consistency of results using the I 2 statistic (Higgins, 2002(Higgins, , 2003. Evidence of heterogeneity (p value from test of heterogeneity < 0.1 coupled with an I 2 value of 25% or greater) for any of the outcomes synthesised, is highlighted in the accompanying narrative to that outcome reporting.

| Assessment of reporting biases
Reporting bias was counteracted to some extent by deploying a highly sensitive systematic search of bibliographic databases, and supplementing this with additional searching of grey literature sources, reference list searching, expert consultation and hand searching. Unpublished data from two separate studies were located through author correspondence, and are included in the review. Primary studies were reviewed for references to a study protocol which could be obtained to check for outcome measures being dropped, or added; just one study report referenced a protocol. Primary study authors' choice of outcomes to study and report were appraised. Only four reported on the continued maltreatment of children. The implications of this are discussed in Selective reporting (reporting bias). The use of a funnel plot, to help identify potential reporting bias in primary studies, was not possible given the small number of study findings synthesised under each outcome heading.

| Data synthesis
Meta-analyses were conducted using RevMan 5. None of the primary studies reported on comparisons between FGDM versions, so all syntheses were completed on the absolute effect of FGDM versus no FGDM.
Two studies reported findings from samples separated geographically (from separate child protection agencies, or different territories of the same agency) these data were synthesised as separate studies, because there was a degree of heterogeneity between them.
ORs were used to represent binary outcome data. Continuous data were converted into SMDs. All outcomes were presented with 95% CIs. Hedges' g was used to correct for small sample bias. Where findings for a particular outcome were reported by some studies with continuous data, and with dichotomous data in other studies, the Campbell Collaboration online conversion (Wilson, 2018) calculator was used to convert studies to the majority format.
We assumed there would be unexplained sources of heterogeneity across studies; hence we used a random effects model of meta-analysis.
Results for randomised experiments and quasi-experimental designs were reported separately. Meta-analysis was not possible for several of the outcomes reviewed as they were only reported by one or two primary studies. A narrative review is provided for these. Given the small number of studies overall, and the level of heterogeneity between them we did not perceive any opportunities for moderator, sensitivity or outlier analysis. The syntheses completed showed moderate-to-high levels of heterogeneity between studies for all outcomes. We deemed the presentation of an overall effect size to be inappropriate for some of the outcomes: when the number of studies synthesised was small, and findings were highly heterogeneous.

| Subgroup analysis and investigation of heterogeneity
There were no opportunities to complete a subgroup analysis according to method, FGDM version, population or follow-up periods. Where studies included findings from interval measures of an outcome, measures taken at the longest time-period from the intervention were used.

| Sensitivity analysis
Due to the small number of primary studies, and limited meta-analyses completed, there was no opportunity for a sensitivity analysis. Table 1 provides an overview of primary study characteristics. The included studies are described in terms of the setting, participants, interventions and outcome measures.

| Results of the search
The main bibliographic database search, completed in July 2016, returned 1,320 records. These records were combined with 41 additional records found through reference list searching, hand searching and correspondence with experts and known study authors. This original bibliographic search was re-run in August 2019, adding 215 search hits. A total of 1,576 studies were subjected to initial screening, 92 of these were selected for full text screening, and 15 of these (describing 13 studies) were found to meet the inclusion criteria for the review. Two additional studies were identified following the 2019 search, through correspondence with primary study authors.   Baumann (2006) and Sheets et al. (2009), reported findings from the same study. Baumann reported findings on the nature of children's placements, Sheets et al. reported some of these findings, but also additional findings related to service user satisfaction. Both study reports were needed to ensure all available findings were obtained. Edwards, Tinworth, Burford, and Pennell (2006) and Pennell (2010) reported findings from the same study of case records. Pennell reported findings for kinship care, expedition of case processing and family reunification; Edwards reported findings on continued maltreatment.

| Included studies
Hollinshead (2017) reported on continued maltreatment and family reunification from an randomised controlled trial (RCT), and Corwin et al. (2019) followed up with a further report from this study on caseworkers' perceptions of social support following intervention.
Only four of the included studies were conducted outside the United States; two in Canada (Cunning & Bartlett, 2006; and one in Sweden  and one in the Netherlands (Dijkstra, 2018). Of the fifteen studies reviewed, just three were RCTs Dijkstra, 2018;Hollinshead, 2017). The other studies employed quasi-experimental designs, using parallel cohorts. Four of the quasi-experimental studies used prospective evaluations of nonrandomly assigned comparison groups (Baumann et al., 2005Sundell & Vinnerljing, 2004; the rest analysed pre-existing survey data, child protection case files or court data. The longest postintervention follow-up period was 3 years, used by Sundell and Vinnerljung. Two study reports  and Cunning and Bartlett (2006) presented findings for separate geographical areas separately.
Cunning et al. also reported findings from a combination of the two regions, under one outcome heading. Each grouping was treated as a separate population in the data synthesis.

| Excluded studies
Eighty-five study reports were excluded during the final, full text, screening. Thirty-eight studies were excluded because the study design did not meet the minimum standards of methodological rigour outline in the review protocol (Shlonsky et al., 2009) and this was, most commonly, because they had no comparison group. Several studies, presented as evaluations, used qualitative data. Tweny-four studies were excluded because the intervention was not FGDM. The remaining studies were excluded due to: wrong population (nine); insufficient data (five); data being intractably unavailable (two); wrong outcomes (one); and the study has not been completed (one). A list of excluded studies and reasons for exclusion is presented in Excluded studies.

| Risk of bias in included studies
The review authors judged there to be a moderate or high risk of bias in most categories in each of the studies reviewed, see Figure 1 for a summary of judgements on bias across the studies reviewed. Figure 2 provides an insight into the level of potential bias within each study.
Appendix A provides the rationale for each of these judgements.

| Allocation (selection bias)
Selection bias is comprised of sequence generation and allocation concealment. Studies were rated as high if they failed to provide sufficient information or used comparison groups that represent a population subset. Ten out of the eleven included studies (91%) were | 13 of 64 rated as having a high risk of selection bias. No studies were rated as low risk, with one study (9%) rated as unclear risk:  used random assignment, but provided insufficient information to make a judgement on allocation concealment.

| Blinding (performance bias and detection bias)
This potential bias is counteracted by the blinding of study participants and personnel, so that they are unaware of their group assignment, and the blinding of outcome assessors. Participants and personnel, in ten of the studies, would have been aware of the type of deployment of FGDM. Only one study report (12%) described the blinding of outcome assessors.

| Incomplete outcome data (attrition bias)
Attrition bias refers to the biasing effect of study participants, or study participant data becoming unavailable during the study. This bias can be counteracted by keeping accurate records of participants who drop out of the study, and by using intention-to-treat analysis so that drop-outs do not have a biasing effect on final results. None of the study reports offered information on how families who dropped out of FGDM or comparison treatments were recorded or accounted for in the analysis of findings. For this reason, all of the studies were rated as having an unclear risk of attrition bias.

| Selective reporting (reporting bias)
None of the included study reports references a study protocol. We have no way of knowing if some outcome measures were dropped, or added, as the study progressed. Therefore, all of the included studies were judged to have, as a minimum, unclear risk of selective reporting bias.
Five studies were judged to have a high risk of incomplete reporting bias, as some findings were clearly missing or only partially described.

Study design bias
If study design choices did not appear to have affected findings for intervention and control groups differentially, study design bias was rated as low. This was the case in three studies (27%). Three studies (36%) were rated as high risk because, variously, study participants self-selected into study groups or social workers assigned participants to study groups, or the use of FGDM was not adequately confirmed, or study authors referred to qualitative findings as evidence of efficacy. Four studies (36%) were rated as unclear in this category; in these, little or no rational was provided for study design or selection of comparison groups.

Baseline imbalance bias
Imbalance at baseline may influence study outcomes and the results of statistical tests. This was rated as high risk in seven of the eleven included studies (64%). The remaining four studies (36%) provided insufficient data, from which to make a judgement, and were rated as unclear.

Confounding variable bias
Confounding variables were judged to be a high-risk factor in four of the included studies (36%). In each case, practitioners were the potentially confounding factor. The remaining seven studies were rated as unclear due to insufficient data being provided. For a low risk of bias in this category, primary study authors would have needed to have offered an assessment of potentially confounding variables, and a description of how they were nullified or dealt with in data analysis.
F I G U R E 1 Risk of bias graph: Review authors' judgements about each risk of bias item presented as percentages across all included studies F I G U R E 2 Risk of bias summary: Review authors' judgements about each risk of bias item for each included study

Differential diagnostic activity bias
Studies were rated as high risk in this category, if different measures or collection methods were employed within the intervention and comparison groups. This was the case in one study   Bias due to the use of insensitive instruments for outcome measurement Four studies (36%) were rated as high risk of bias for insensitive instruments used to measure outcomes. This included issues regarding the quality and appropriateness of some outcome measures (for example, re-referrals as a measure of on-going abuse). Six studies (55%) under-described their measurement of outcomes so that judgement was difficult in this category. One study was deemed to have a low risk of bias, due to comprehensive reporting of appropriate diagnostic activity.

Researcher allegiance bias
Researcher allegiance was rated as high risk of bias in two studies (Cunning & Bartlett, 2006;Pennel & Burford, 2000) the authors argued for the benefits of FGDM within the article, but without supporting references to an appropriate evidence base. Clear information regarding the independence of researchers was provided in only one study ; the remaining studies (73%) were rated as unclear, due to a lack of information about the independence of researchers from FGDM providers.

Funding source bias
A study which is funded by proponents of FGDM, or an agency which has invested in FGDM may be at risk of funding source bias. One of the included studies  was conducted by an independent government department charged with the evaluation of social care practice; this study was rated as low risk in this category. The remaining ten studies were rated as having an unclear risk of funding source bias due to insufficient information, or due to funding being provided by the FGDM provider.

Contamination bias
Four of the eleven included studies (36%) were given a high-risk rating of contamination bias. A high-risk rating was given if the same practitioners delivered both interventions or if the social workers F I G U R E 3 Flow chart of study selection process F I G U R E 4 (Analysis 1.1) Forest plot of comparison: 1 Traditional child protection case processing, outcome:

| Synthesis of results
Meta-analysis and narrative review were applied to findings from fifteen studies. Sufficient data existed to warrant meta-analyses under the following outcome groupings: reunification of children with families or maintenance of in-home care; continued maltreatment; kinship placements; and expedition of case processing and case closure. A narrative review is also offered for findings under the following outcome groupings: placement stability; child well-being; service-user satisfaction; and referrals to support services.

| Reunification of children with families or maintenance of in-home care
Ten effect sizes, from nine quasi-experimental studies, were synthesised to examine effects on the reunification of children with their family, or the effect on maintaining in-home care; in short, the effect FGDM has on keeping families together. It can be seen from the forest plot ( Figure 4) that the dominant finding from the synthesis of these study results is the lack of clarity. There is a high level of heterogeneity between the studies (I 2 = 92%, see Analysis 1.1); six study findings come with very wide CIs; and CIs for six out of the ten studies span the line of no effect. The overall effect, based on the combination of these studies is small but statistically significant: OR,

| Continued maltreatment
Meta-analysis of five quasi-experimental studies, which reported the number of children who continued to be maltreated, following FGDM or traditional child protection decision-making procedures, is provided in Analysis 1.2. It can be seen that just one study recorded significantly lower incidents of continued maltreatment . While   In addition, to the analyses presented in Analysis 1.2 and Analysis 1.3,  and Cunning and Bartlett (2006) both report ratio data relating to continued maltreatment. Walker (2005)  An overall effect size, based on the 10 studies referred to here, is not offered because of the lack of conformity in data types, study designs, and because of the level of heterogeneity across the study data synthesised in Analyses 1.2 and 1.3. The data pertaining to FGDM and continued maltreatment could be summarised as inconclusive.

| Kinship placements
Five effect sizes, from nonrandomised studies, were synthesised to examine the effect of FGDM on the number of kinship placements. It can be seen from Analysis 1.4 that there was a high level of heterogeneity between the studies (I 2 = 74%); two study findings had very wide CIs; and CIs for four out of the five studies span the line of no effect. The overall positive effect based on the combination of these studies is negligible: OR, 1.31 (CI, 0.94, 1.82). and primarily a reflection of Wang et al.'s (2012) finding (weighted at 96%).
Walker (2005) also reported findings of a positive effect on kinship placements. Walker reported ratio data, without enough information to compute an OR, for inclusion in Analysis 1.4. In Walker's study, the average number of times children were moved to a kinship placement for the FGDM group was 0.68 (n = 54) and for the non-FGDM group it was 0.95 (n = 30).

| Expedition of case processing and case closure
Because SDs for Weisz, Korpas, and Wingrove (2006) and  were unavailable, and a SD for Walker (2005) was approximated using Cochrane guidance (Higgins & Green, 2009) the provision of a study heterogeneity statistic, or overall effect size, was not possible. Table 2 provides an overview of findings from all six (quasi-experimental) studies on case processing speed or case closure. It can be seen that study findings for this measure are wideranging and inconsistent. Table 3 summarises study findings pertaining to placement stability.

| Placement stability
Berzin's (2008)  Dijkstra (2018) found that perceptions of empowerment, 12 months after a care plan had been agreed, did not differ between FGDM and non-FGDM groups of parents. FGDM and a propensity score matched (Rosenbaum & Rubin, 1983)  was not statistically different between children who experienced FGDM meetings and those who did not. 6 | DISCUSSION

| Summary of main results
A high level of heterogeneity between primary studies, and a high risk of bias across primary studies, are the foremost findings of this review. Any discussion of overall effect sizes is overshadowed by these two key review findings.
The primary outcomes of interest, as outlined in the review protocol (Shlonsky et al, 2009) were FGDM effects on child maltreatment, family permanence and placement stability. The synthesis of study findings provided here, in relation to these outcomes, is inconclusive.
While a meta-analysis of ten quasi-experimental study findings provides a small overall effect size on the reunification of children with their families, we suggest that this should not be held as evidence of FGDM efficacy, for several reasons. The wide-ranging CIs within some of the studies, and the wide range of findings across studies, suggest limited reliability for these findings. When considered alongside a high risk of bias across the studies, these shortfalls detract greatly from the importance of an overall small effect size.
Evidence of the effect FGDM has on continued maltreatment is also inconclusive. Four out of five nonrandomised studies found that FGDM reduced the likelihood of further maltreatment, but a metaanalysis of these was not statistically significant. Three RCTs (including four study samples) were also pooled using meta-analysis. Evidence of the effect FGDM has on family group type permanency goals, service user satisfaction, child well-being and on engagement with support services is also inconclusive. Just one or two studies reported on each of these outcomes, and overall effect sizes were not calculable. reporting bias were the most significant detractors from the internal validity of the studies reviewed; these were rated as high in a large majority of the studies. One study by  was judged to have a high risk of bias in eight out of the fourteen categories assessed. The average number of high bias ratings per study was 5.9. We would suggest that the range and extent of potential bias in this body of evidence is cause for caution in judging the efficacy or harm of FGDM interventions.

| Overall completeness and applicability of evidence
Also, in relation to the quality of evidence, we should acknowledge that the body of evidence is small: in terms of eligible studies; and due to the limited overlaps in outcome measures used across the dataset. Only one of the outcomes of interest, reunification, was reported by a majority of the nine studies. From another perspective, the body of evidence reviewed here is substantial, it includes data from over 93,000 study participants. This compares favourably to the majority of reviews published by the Cochrane and Campbell collaborations. However, these participant data are predominantly gathered from large retrospective cohort studies, using secondary data.

| Potential biases in the review process
We could not identify any potential biases in the current review process.

| Agreements and disagreements with other studies or reviews
The current review findings disagree with narrative reviews by Crampton (2006) The methodological rigour across this body of evidence must be described as low. The risk of bias among primary studies is high.
The range of outcomes reported offers limited opportunity for meta-analyses. The small meta-analyses, completed here, brought together quite heterogeneous findings. In these circumstances, the current review authors would emphasise that there is insufficient evidence to support a judgement on the efficacy of FGDM, for the prevention of abuse and neglect of children. Tukey (1986, p. 74) stated that "the combination of some data, and an aching desire for an answer, does not ensure that a reasonable answer can be extracted from a given body of data". While we have been able to combine data from separate studies, on a number of outcomes of interest, we believe that it would be misleading to suggest that these meta-analyses provide answers to questions of FGDM efficacy.
Considering that FGDM is based on sound theoretical underpinnings, humanistic (Horwitz & Marshall, 2015) and systems theory (Holland & Rivett, 2006); that FGDM aligns with social work values and aspirations such as partnership in practice (Lohrbach, 2003) and strengths-based intervention (Connolly, 2005); and that FGDM is an explicit recognition of family's rights (Edwards & Sagatun-Edwards, 2007). The current review authors concur that the theoretical underpinning for FGDM is logical. Like the authors referenced here, and many others besides, we can see how FGDM has emerged as a logical step in the development of child protection practice.
However, the findings of this review give us pause, to consider, why do we believe outcomes are improved with FGDM?
We would point towards commentary that highlights how little we know about what works in child protection work: "It is a sad fact that scientific knowledge of truly effective interventions in child protection is relatively sparse" (Sundell & Vinnerljung, 2004, p. 282).
In this situation, it is not inconceivable that policy makers and practitioners have accepted the best evidence they have to hand.
Policy makers charged with the allocation of resources for child protection should therefore consider the commissioning of rigorous evaluations of FGDM and non-FGDM methods of decision-making.
The prevailing sentiment, that FGDM is preferable to other approaches to decision-making, should be set aside pending appropriate evaluation.
Drawing on commentary of primary study authors, we can suggest several potential reasons for the equivocal performance of FGDM models in comparison to traditional practitioner-led decisionmaking models. These insights may inform the development of practice and its evaluation in this field.
First, let us consider that the success of the decisions, and action plans, put forward by families may be dependent upon the resources available to support these decisions and action plans (as suggested by . While we might assume that a lack of services, such as counselling, respite or specialist assessments will affect FGDM children and non-FGDM children equally, we could also conceive that a family which has successfully used the FGDM model are proffered more autonomy to make their own plan happen. Reduced practitioner focus on FGDM plan implementation would have a negative and confounding effect on FGDM outcomes. Second, there is the possibility that the support offered by family, extended family and the community during the FGDM process is not fully realised. Sundell and Vinnerljing (2004) question if FGDM can make a lasting difference when child welfare authorities attempt to mobilise, informal, networks of children at risk? C. S. M. ; Marsh and Crow (1998); ; Shore, Wirth, Cahn, Yancey, and Gunderson (2002); and Sundell and Haeggman (1999) all report some level of qualitative feedback, or survey data, from FGDM participants that promised family supports which did not materialise in the manner expected.
Third there is a question mark over the readiness of social work departments, and individual social workers, to embrace FGDM's deference to family decisions, and the family plan .
For example, private family time is not always facilitated, for example in Riverside County, California (see . Vesneski (2009) report fear of speaking up, Adams and Chandler (2004)  planning, which places practitioners on the periphery; and asks the family, within which abuse or neglect has been perpetrated, to divine the best way forward? We would argue that there is.
While we have called for more rigorous evaluation of FGDM, and a process of FGDM development in response to more rigorous evaluation, we would hope that the findings reported here do not contribute to a side-lining of FGDM. Service users prefer FGDM (Sheets et al., 2009;. Practitioners who engage with FGDM are also positive about it as they have found it reduces conflict between practitioners and families (Wick, 2014). "A child protection system that uses these models (FGDM and similar) and, where possible, draws upon family strengths as a part of a spectrum of responses to different situations that arise during the life of a child's case, will serve the child, the family, and the community in a more nuanced and effective way" (Edwards & Sagatun-Edwards, 2007, p. 20). In concurrence with Edwards and Sagatun-Edwards going forward, we believe it is likely that policy makers will adopt criteria for the allocation of FGDM service to appropriate child protection cases. While it is unlikely to be appropriate as a blanket response to all cases of neglect and maltreatment, in any jurisdiction, its potential as a strengths-based family intervention may be fully realised through further development and evaluation.
Finally, let us consider the possibility that FGDM cannot have a large impact on outcomes for children. Not because there is anything in particular wrong with it, but because improving outcomes for children at risk of abuse and neglect is very difficult to achieve. Child abuse and neglect correlates strongly with poverty, deprivation, and displacement (Aber, Bennett, Conley, & Li, 1997;Myers, 2002). Child abuse and neglect is at least partly subject to intergenerational transmission (Lo, Chan, & Ip, 2017). Any intervention which provides us with even marginally better outcomes, in the face of society-wide seemingly intractable challenges such as these, is to be embraced: "FGDM may not be a strong enough intervention to effectively improve child welfare outcomes or may be just one step in improving these larger outcomes" (Berzin, , p. 1456 7.2 | Implications for research  explain that due to both political and practical reasons, an RCT was not an option in their study.   The independent monitoring of service delivery will also counteract the confounding effect of cross-pollination between trial arms.
Readers with experience of working in child protection services will be aware that an initiative designed to improve practice, launched in one part of the service, is likely to be discussed and drawn upon throughout the service, even though it has not been fully im- Following the conduct of this review, we believe we are in a position to add to the discussion on two potentially powerful confounding variables that various commentators have highlighted.
Namely, the confounding effect of family's action plans not being implemented, and the confounding effect of increased reporting of abuse, due to increased involvement of extended family.
The degree to which family's' action plans are implemented, is vitally important. Berzin (2006)  Researchers should also be aware of the potentially confounding effect of increased reporting of abuse in the FGDM arm, due to deeper involvement of extended family in the cases. Sundell and Vinnerljung (2004) found that significantly more FGDM children were re-referred to protection services during a 3-year follow-up period, than non-FGDM children. Sundell and Vinnerljing acknowledged the possibility that FGDM might have led to increases in referrals, given that family members would be more aware and more likely to report abuse, but clarified that few children in their study were re-referred by extended family members. The point remains, however, re-referrals may be an indication of more diligent monitoring of child welfare, as opposed to a robust indicator of the success of any given intervention.

CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.

DIFFERENCES BETWEEN PROTOCOL AND REVIEW
There were three differences between the protocol (Shlonsky et al., 2009) a priori guidance for the conduct of this review, as follows.

Assessment of bias categories
The protocol outlined five categories of bias. The assessment of research bias has advanced significantly, since the protocol was written in 2009, and the five categories of bias described in the protocol were sub-divided and additional categories were added. Fifteen categories of bias were used in the review.

Outcomes reviewed
We extended the range of secondary outcomes of interest. We reviewed data pertaining to families' engagement with services, and families' perceptions of support. These data were available in four primary study reports. The omission of these outcomes from the review protocol was deemed an oversight. These data made no difference to the overall conclusion of the review. Study design High risk Data is provided for three groups which did not recieve an FGDM meeting: families who were not deemed appropriate for referral to FGDM; families who refused FGDM; and families for whom the child removal petition had been withdrawn. The control group is therefore likely to be inherently different to the intervention group The intervention group comprised of families who had an FGDM meeting (data are reported separately for those who built a plan through FGDM, and those who did not; these data are amalgamated for the current synthesis). If FGDM did not result in a plan for the family then the integrity of the intervention is in doubt The authors acknowledge that their matched comparison group, was not viable. The comparison data extracted for this review relates to FGDM referred cases only. A comparison is possible as only some of these cases actually had a conference. A key problem with the rigour of the study, even after employing this selective data extraction: we do not know why some cases went on to have an FGDM conference, and some did not

Confounding variable Unclear risk Insufficient information
Differential diagnostic activity Low risk There is no information to suggest that differential diagnostic activity occurred. Authors describe a data verification process, in which 20% of extracted data were checked against source case files Insensitive instrument used to measure Unclear risk Insufficient information

Researcher allegiance
High risk There appears to be an emphasis on comparisons which show FGDM to have worked, alongside selective outcome reporting. For example, see Figure 4 (p. 21) for example of the study report which emphasises a favourable outcome The authors acknowledge that their matched comparison group, was not viable. The comparison data extracted for this review relates to FGDM referred cases only. A comparison is possible as only some of these cases actually had a conference. A key problem with the rigour of the study, even after employing this selective data extraction: we do not know why some cases went on to have an FGDM conference, and some did not

Confounding variable Unclear risk Insufficient information
Differential diagnostic activity Low risk There is no information to suggest that differential diagnostic activity occurred. Authors describe a data verification process, in which 20% of extracted data were checked against source case files Insensitive instrument used to measure Unclear risk Insufficient information

Researcher allegiance
High risk There appears to be an emphasis on comparisons which show FGDM to have worked, alongside selective outcome reporting. For example, see Figure 4 (p. 21) for example of the study report which emphasises a favourable outcome Reason for exclusion Outcome was "average number of points of concern"; these were not exclusively indicators of maltreatment or neglect

Sequence generation
Studies are to be rated as low risk if they adopted a typical method of random group assignment (for example using computer-generated random number lists) Studies using comparison groups which may represent a population sub-set are to be rated as high risk in this bias category An unclear risk is to be assigned were primary study authors failed to provide sufficient information. Or if they have used a comparison group which is likely to represent the study population with equivalence to the intervention group Allocation sequence concealment Randomised trials which concealed the sequence by which study participants were to be allocated, between FGDM and control intervention, from those in charge of study participant allocation were rated as low risk Nonrandomised studies are to be rated as having a high risk in this category Randomised trials which did not describe a means of allocation sequence concealment are to be rated with an unclear risk Blinding of participants and personnel If practitioners were not aware of research study (as is likely with retrospective parallel cohort studies) then the risk should be recorded as low If practitioners and participants, involved in delivering and receiving FGDM or the control intervention, were aware that the interventions were under study then the risk is to be recorded as high If it is not clear whether practitioners and participants were aware or not then the risk should be recorded as unclear

Blinding of outcome assessment
If a description of how outcome assessors were blinded is provided then the risk of bias should be recorded as low Risk of bias should be recorded as high if those charged with outcome data collection were privy to participant allocation between FGDM and the control intervention If no information regarding the blinding of outcome assessors is available then the risk of bias should be recorded as unclear

Incomplete outcome data
This should be recorded as low, if all data which plausibly should be reported, is reported, and all study participants are adequately accounted for in the reporting of findings, including data pertaining to participant withdrawals are refusals.
In particular, any group differences in which participant withdrawals are refusals, should be described If there is a discrepancy between study participant numbers and reported outcome data then risk of bias should be recorded as high If data is missing due to participant withdrawal refusal to take part in the risk of bias should be recorded as unclear.
If outcome data is only partially reported, for example if a measure of significance is reported without an effect size, or an average is reported without an indicator of variance then risk of bias should be recorded as unclear

Selective outcome reporting
If sufficient information is provided about the outcome measures deployed, and reported study findings correspond to the outcome measures used, then the risk of bias should also be reported as low Risk of bias should be recorded as high if only a subset of the original outcomes measured and analysed in a study are fully reported, or if there is any evidence of selective reporting of data on subgroups Or if primary study authors use finally treated rather than intention-to-treat analyses, if they choose to analyse continuously measured variables categorically, or if categorical variables parameters have been chosen insensitively, then risk of bias should be recorded as unclear

Study design bias
Within the boundaries of the general study design (e.g., a retrospective parallel cohort study will not feature the randomisation of study participants) if study design choices do not appear to have affected findings for intervention and control groups differentially, bias risk should be recorded as low Where study design choices, for example having participating families self-selecting to intervention and control groups, are likely to bring about group differences, or confounding factors bias risk should be recorded as high Where study design choices, such as the choice of time point for data collection, have clearly affected findings for intervention and control groups differentially, bias risk should be recorded as unclear Baseline imbalance Where baseline differences have been comprehensively assessed, reported and found to be insignificant the risk of bias should be reported as low Indicators of sampling bias, such as the recruitment of an FGDM group of children with less complex needs than a comparison group, should attract a high risk rating. Indicators of significant group differences such as differences in the severity of abuse experienced by children in each group at baseline should also attract a rating of high risk Where baseline differences have not been assessed or reported adequately the risk should be recorded as unclear

Differential diagnostic activity
Record a low risk where diagnostic activity is adequately described with no apparent differences in how it was performed with the FGDM and comparison groups Record a high risk of bias if different measures, timeframes or data collection methods are employed with the intervention and comparison groups Record an unclear risk if there is an insufficient description of how data were collected (Continues)

Insensitive instrument used for measurement
Outcomes of interest such as family permanence, placement stability, and prevention of child maltreatment should be measured sensitively for a low risk rating Examples of insensitive instruments will include the use of rating scales which are unlikely to capture the full range of data available. Where this is the case, bias risk should be recorded as high Where insufficient information is provided about the measurement tools used, an unclear risk should be recorded , and steps to ensure researcher independence are not described, it may be appropriate to record a risk of unclear Where the funding source bears no plausible connection to the promotion of FGDM or control interventions this risk should be recorded as low

Contamination bias
The key principles of FGDM could conceivably be assimilated into traditional child protection service delivery. In general terms, it is acknowledged that FGDM has influenced the conduct of child protection work across the globe Where it is clear that the implementation of FGDM has been kept separate from traditional child protection services then a low risk should be recorded Explicit examples of the use of FGDM in control interventions will attract a high risk rating, as would the use of the same practitioners to deliver both interventions Where there is insufficient information provided to make a judgement on this risk, but it appears likely that the same practitioners were used for both FGDM and control interventions then on unclear risk should be recorded Goldbeck S1 ab("randomised controlled trial" OR "randomized controlled trial" OR "randomised controlled study" OR "randomized controlled study") 432 a S2 ab((control* OR prospectiv*) N/10 (study or trial)) 5,694 a S3 ab(random*) 12,963 a S4 ab("clinical trial") 296 a S5 ab((singl* or doubl* or trebl* or tripl*) N/10 (blind* or mask*)) 157 a S6 S1 OR S2 OR (S3 AND S4) OR S5 5,885 a S7 ab("family group" or "family decision*" or "family conferenc*" or "family meeting" or "family unity" or "family team" or "family centred" or "family centered" or FGC or FGDM) 346 a S8 ab(famil* or parent* or caregiver* or guardian*) 55,621 a S9 ab("group conference*" or "group decision*" or "team conference*" or "team decision*" or "case meeting" or "case planning" or "planning meeting" or "consensus-based decision-making" or "consensus based decision making" or "consensus based decisionmaking") 349 a S10 ab(abuse* or neglect* or maltreat*) 14,974 a (Continues) MCGINN ET AL.