Parenting for Lifelong Health for Young Children: a randomized controlled trial of a parenting program in South Africa to prevent harsh parenting and child conduct problems

Background Parenting programs suitable for delivery at scale in low‐resource contexts are urgently needed. We conducted a randomized trial of Parenting for Lifelong Health (PLH) for Young Children, a low‐cost 12‐session program designed to increase positive parenting and reduce harsh parenting and conduct problems in children aged 2–9. Methods Two hundred and ninety‐six caregivers, whose children showed clinical levels of conduct problems (Eyberg Child Behavior Inventory Problem Score, >15), were randomly assigned using a 1:1 ratio to intervention or control groups. At t 0, and at 4–5 months (t 1) and 17 months (t 2) after randomization, research assistants blind to group assignment assessed (through caregiver self‐report and structured observation) 11 primary outcomes: positive parenting, harsh parenting, and child behavior; four secondary outcomes: parenting stress, caregiver depression, poor monitoring/supervision, and social support. Trial registration: ClinicalTrials.gov (NCT02165371); Pan African Clinical Trial Registry (PACTR201402000755243); Violence Prevention Trials Register (http://www.preventviolence.info/Trials?ID=24). Results Caregivers attended on average 8.4 sessions. After adjustment for 30 comparisons, strongest results were as follows: at t 1, frequency of self‐reported positive parenting strategies (10% higher in the intervention group, p = .003), observed positive parenting (39% higher in the intervention group, p = .003), and observed positive child behavior (11% higher in the intervention group, p = .003); at t 2, both observed positive parenting and observed positive child behavior were higher in the intervention group (24%, p = .003; and 17%, p = .003, respectively). Results with p‐values < .05 prior to adjustment were as follows: At t 1, the intervention group self‐reported 11% fewer child problem behaviors, 20% fewer problems with implementing positive parenting strategies, and less physical and psychological discipline (28% and 14% less, respectively). There were indications that caregivers reported 20% less depression but 7% more parenting stress at t 1. Group differences were nonsignificant for observed negative child behavior, and caregiver‐reported child behavior, poor monitoring or supervision, and caregiver social support. Conclusions PLH for Young Children shows promise for increasing positive parenting and reducing harsh parenting.


Introduction
Parenting programs have been identified as a key strategy for preventing violence against children. They are thus critical to achieving United Nations SDG 16.2, to end "all forms of violence against children" (WHO, 2016). Delivered as early interventions, they are effective in reducing child conduct problems and youth risk behaviors (Piquero, Farrington, Welsh, Tremblay, & Jennings, 2009).
Effectiveness of parenting programs is well established (e.g., Chen & Chan, 2015;Knerr, Gardner, & Cluver, 2013), but questions remain about the best targeting and delivery strategies for achieving SDGs. Globally, violence against children is more prevalent in LMIC (Hillis, Mercy, Amobi, & Kress, 2016), as are conditions that increase parenting difficulties, including poverty and related stressors such as community violence. Yet, evidence comes chiefly from HIC (Chen & Chan, 2015;Knerr et al., 2013;Leijten, Melendez-Torres, Knerr, & Gardner, 2016), and many evidencebased parenting programs are costly and culturally Western; these factors, particularly cost factors, may make them inappropriate for low-resource LMIC contexts, especially for scale-up (Mikton, 2012).
With these issues in mind, we developed PLH for Young Children, a low-cost parenting program for caregivers of children aged 2-9 (Lachman, Sherr et al., 2016). Program development involved integrating evidence and content from HIC's (e.g., Hutchings, 2013) with findings from a formative evaluation with South African caregivers and service providers . Several principles guided development: evidence for effective components based on social learning principles (Hutchings, Gardner, & Lane, 2004) and the need to train parent group facilitators to work collaboratively with caregivers (Eames et al., 2009;Furlong et al., 2013). Mindfulness-based stress reduction exercises were included to address caregiver-identified needs (Lachman, Sherr, et al., 2016). The program also included traditional Southern African stories, songs, and experiential activities to increase its cultural acceptability. For affordability, the program was designed to be delivered by lay community members. Materials were kept low-cost, easily adaptable, and suitable for low literacy contexts.
The program was tested in a pilot RCT, which suggested that although the program had promise, revisions might strengthen its impact and feasibility Lachman et al., 2017). Content on positive reinforcement and discipline was subsequently refined, and additional training was provided to facilitators to strengthen competency in the collaborative process and in understanding social learning theory.
The revised program was the subject of this larger RCT, with the objective of exploring whether a program designed for the conditions of LMIC could be delivered with fidelity, acceptable to caregivers, and effective in increasing positive parenting and decreasing harsh parenting, thereby reducing child conduct problems. We aimed to target families at elevated risk for harsh parenting by screening for the presence of parental concern about child conduct problems (Piquero et al., 2009). We designed the trial with scale-up in mind: The program was tested under conditions likely to prevail in South Africa, and within a local "real-world" service, an NGO. We assessed outcomes immediately postintervention, and one year later, and used observational assessments of caregiver-child interaction to address potential bias in self-report measures.

Setting
The study was conducted between February 2014 and March 2016 in two historically black African peri-urban settlements, among the most deprived in Cape Town, with high levels of HIV and community and family violence.

The program
Facilitators were paraprofessional community members with high school level education who were hired and trained during the first pilot study to conduct the program (Lachman et al., 2017).
First, facilitators visited each family at home to explore caregivers' goals for their children and discuss any questions they had. Drawing on principles common to many evidence-based parenting programmes, the first half of the program focused on positive relationship building through dedicated one-on-one time and positive reinforcement of desirable behaviour. Subsequent sessions taught limit-setting through instruction giving, household rules, and daily routines; and nonviolent discipline strategies using redirect, ignore, timeout, and consequences for decreasing undesirable behavior. Caregivers practiced new skills in role-play during each of the 12 three-hour sessions and at home with their children. They reported to the group on their home practice, with facilitators underlining the principles of effective parenting through modeling praise and leading group problem-solving to resolve challenges. For full program manual, see http://www.who. int/violence_injury_prevention/violence/child/plh/en/.

Participants
Through targeted sampling and referrals from local agencies, 380 child-caregiver dyads were recruited and screened for trial eligibility. Inclusion criteria for adults included: age 18 + years; primary caregiver of child aged 2-9 years, regardless of status as biological parent; coresiding with child 4 + nights per week; and reporting 15 + problem behaviors on the ECBI problem scale. Of 330 eligible, consenting parents, 310 completed the baseline survey ( Figure 1) and 296 were subsequently randomized to intervention or control arm in a 1:1 ratio. Stratified randomization ensured a balanced design with respect to child age (2-to 5-year-olds and 6-to 9-yearolds) and sex, within each community.
Two participants were referred by schools and 18 by child welfare organizations. The majority of participants (n = 360) were approached through researchers going door-to-door to every eighth home in the communities, after initial attempts to recruit via agencies did not yield sufficient participants. Caregivers were invited to identify one child aged 2-9 on whom to focus for eligibility screening.
All caregivers gave informed consent to participate.

Measures
Parenting and child behavior were the primary outcomes, assessed from multiple perspectives. For self-reported parenting, we used Setting Limits and Supporting Positive Behavior subscales of the Parenting Young Children Scale (PARYC; McEachern et al., 2011), grouped together as "Positive Parenting." Nonviolent discipline, and psychological and physical punishment, were assessed using the ICAST-Parent Report , adapted for trial use by asking only about the past month. Child behavior in the past month was assessed using the ECBI (Eyberg & Ross, 1978) intensity and problem scales. Caregivers and children also participated in a structured observational task: Research assistants asked caregivers to play with their children for ten minutes and then to ask the child to return the toys to the research assistant. This was video-recorded and later coded using a simplified version of the Dyadic Parent-Child Interaction Coding System (Robinson & Eyberg, 1981). Caregiver behaviors were coded either as positive (e.g., smiles) or negative (e.g., criticizing), as were child behaviors: negative (e.g., being rude to caregiver) and positive (e.g., affection). Inter-rater reliability was assessed by trained research assistants, on a subsample of 40% of the videos, achieving above j = .7 for all codes.
Demographic variables assessed at baseline included caregiver age, sex, level of education, marital status, employment status, and household income source; child age, sex, school attendance, and relationship to caregiver; household poverty (Hunger Scale; Labadarios et al., 2005); caregiver's own history of maltreatment as a child (ICAST -Retrospective; Dunne et al., 2009); HIV status of caregiver and child (self-report); and child's orphan status.
Implementation fidelity was assessed using facilitator completed postsession checklists of intervention activities and by videotaping group sessions. Videos were used to verify facilitator checklists, and in supervision to assist facilitator skill and fidelity. Caregiver attendance was recorded. Since the program allowed for home-visit consultations if a caregiver was unable to attend a session, caregivers were counted as attending the session if a home visit was successfully completed. Program acceptability was assessed using a 40-item questionnaire adapted from the Incredible Years Parent Satisfaction Questionnaire (http://www.incredibleyears.com/download/Evalu ations/Final_Parent_Satisfaction_Questionnaire112013.pdf).
All measures were translated into isiXhosa (the local language) by consensus forward translation and checked by back translation. the second. Randomization was conducted after data collection by an off-site statistician with no other contact with the trial. Facilitators informed participants of their allocation to intervention or control. The program began 4 weeks after randomization (t 0 ) in both communities. The immediate post-test (t 1 ) began 17.5 weeks after randomization in the first community and 20 weeks postrandomization (to accommodate summer holidays) in the second. The 1-year follow-up (t 2 ) data were collected 70 and 71 weeks postrandomization in the first and second communities, respectively.
Data were collected in participants' homes. Researchers were accompanied by community guides, and before researchers entered the home, community guides first requested caregivers to keep their allocation confidential. Care was thus taken to keep researchers as blind to allocation as possible, although inadvertent disclosure remained possible.

Data analysis
The analysis plan was developed in advance of analysis. All analyses were conducted in R, version 3.4.3 (R Core Team, 2017). Internal consistency of each measure was assessed using reliability coefficients Cronbach's alpha, omega, and greatest lower bound. Outcome measures were summarized by arm and time point (mean and standard deviations; and median and first and third quartiles).
Each outcome was assessed through several composite scores, derived by summing either Likert scale assessment of the intensity of a behavior or binary indicators of the presence of a trait. The underlying distributions of the composite scores were modeled assuming either Gaussian (for sums of many individual items with symmetric empirical distributions), Negative Binomial (for over-dispersed count outcomes), or Poisson (for count outcomes) distributions. A log link was used for all models, thus providing multiplicative differences (interpretable as percentage differences) between groups.
The study design imposed within-subject and within-group correlation. Multilevel generalized linear models were used to compare the intervention and control arms with respect to changes in behavioral outcomes over time. The models included a time effect (comparison of outcomes at immediate post-test and 1-year follow-up, respectively, to outcomes at baseline) and allowed for a modification of the time effect due to the intervention (through the inclusion of interaction terms), while capturing both that the same child-caregiver dyads were assessed at t 0 , t 1 , and t 2 , and the group-based nature of the intervention through the inclusion of dyad-specific and groupspecific random effects. The models also included adjustment for randomization stratifier child gender, child age, and community, and followed an intent-to-treat approach.
The size of intervention effects at t 1 and t 2 , compared with t 0 , was estimated by the exponents of the coefficients of the interaction terms (due to the use of the log links), which measured the proportional difference in the change from baseline in the outcomes at t 1 and t 2 for the intervention group compared with the control group. Observed p-values are reported throughout the paper for all tests carried out, as a means of assessing strength of association rather than using a threshold value to determine statistical significance. Lower pvalues are indicative of stronger associations and larger ones of weaker or no meaningful association. Additionally, should the focus be on hypothesis testing, Holm's method was used to calculate adjusted p-values that control the family-wise error rate taking into account the focus on two effect sizes for each of 15 outcomes (i.e., 30 comparisons).
Post hoc power calculations were based on effect sizes as measured by the beta coefficients for the interaction terms in the generalized linear mixed models, assuming a variability in effect sizes of .55 (calculated as average of standard error of beta times the square root of sample size). The randomized sample sizes of 148 per arm allowed for detection of a minimum proportional difference between the intervention and control group with respect to changes from baseline of 22% with 90% power if a significance level of 5% was chosen, taking into account the (30) multiple comparisons. The detectable proportional difference becomes 27% at a minimum sample size of 104 per armaccounting for the outcome with the most missing data. Smaller effects may be identifiable if they are associated with smaller variability.

Sample characteristics
Most (240, 81.1%) caregivers were the child's biological mother. One male caregiver was recruited, allocated to the intervention group, and did not attend but completed all assessments. The groups were similar in terms of most characteristics (see Table 1), with the apparent exceptions that the intervention arm had more HIV-positive caregivers and more who reported IPV, while the control group had slightly more employed caregivers. Household vulnerability is illustrated by 26 children (8.8%) having lost at least one caregiver; 76 (25.7%) caregivers reporting being HIV-positive, 93 (31.4%) reporting risky alcohol use, 89 (30.1%) reporting past month IPV; and 140 (47.3%) reporting physical, 98 (33.1%) emotional, and 19 (6.4%) sexual abuse as children. Most caregivers (253, 85.5%) were unemployed.
All scales (see Appendix S1 ) had alpha, omega or greatest lower bound .7 or greater, except for the ICAST physical and nonviolent discipline subscales (probably because of skewness and zero-inflation; Trizano-Hermoslia & Alvarago, 2016).
For caregiver self-report, the follow-up rate was 97.0% at t 1 and 91.9% at t 2 . Follow-up rates for observations were lower (90.2% at t 1 ; 71.3% at t 2 ; see Figure 1). Those lost to follow-up were more likely to report drug use (13.3% vs. 5.6%), and physical abuse as a child (58.3% vs. 51.8%), and slightly less likely to report use of nonviolent discipline of their own child (mean 5.9 vs. 6.4). They were observed to show more positive (mean 16.9 vs. 13.6) and negative (mean 3.7 vs. 2.9) interactions with their child; their children showed more negative behaviors (mean 58.6 vs. 48.3). These were small differences but may suggest a slightly more negative profile for parents who were lost to follow-up.

Program implementation and acceptability
Facilitators delivered 96.8% of the manualized activities. All parenting skill components were delivered; activities that were not covered were less central, such as "energizer" exercises. Most (110, 74.3%) caregivers attended at least one group session, and average overall attendance (group sessions plus home visits) was 8.4 sessions (70%), figures which are within the range for other parenting programs (Chacko et al., 2016). The 84 caregivers (56.8%) who attended the last session (#12) reported very high overall program satisfaction (M = 94.9%; SD = 8.2). Table 2 presents data at all three time points and Table 3 the results of GLMMs for intervention and control group differences. Both groups improved over time, but there were a number of differences between groups.

Program effects
At t 1 , the strongest differences (based on both the original and the adjusted p-values) between the intervention and control groups were observed with respect to the frequency of self-reported positive parenting strategies (10% higher in the intervention group), observed positive parenting (39% higher), and observed positive child behavior (11% higher). Intervention impact on observed positive parenting and positive child behavior endured at t 2 , with higher frequencies of 24% and 17%, respectively. Additionally, based on the unadjusted p-values, at t 1 there were indications that the intervention group self-reported fewer child problem behaviors (11% fewer), had fewer problems with implementation of positive parenting strategies (20% fewer), and self-reported less physical and psychological discipline (28% and 14% less, respectively). Likewise, there were indications that caregivers reported less depression (20% less) but more parenting stress at post-test (7% more). Differences at either t 1 or t 2 in caregiverreported child behavior, observed negative child behavior, poor monitoring or supervision, or caregiver social support were much smaller or negligible as confirmed by smaller effect sizes and larger p-values.

Discussion
To be suitable for scale-up, a program must demonstrate effectiveness, be delivered with fidelity, and be acceptable to caregivers (Gottfredson et al., 2015). PLH for Young Children was specifically designed to be suitable for, and was tested under, the conditions that prevail in LMIC.
Effects on primary outcomes included increases in observed positive parent and child behavior, and, at t 1 , fewer child problem behaviors, and a trend toward less physical and psychological discipline. Effects that endured to, or emerged at, t 2 were particularly among observed behaviors and therefore   less subject to self-report biases. Although this must be balanced against the fact that there were also large changes in the control group and so groups were not different in terms of self-reported harsh discipline or child conduct problems at t 2 , the program demonstrates potential for increasing positive parenting. If this effect is strengthened, enduring reductions in child conduct problems may follow. Among secondary outcomes, t 1 data also revealed a possible, if small, effect on caregiver depression, although this did not maintain to t 2 , a pattern found in some other trials (Barlow, Smailagic, Huband, Roloff, & Bennett, 2012). There was also a slight trend toward an increase in parenting stress at t 1 which did not maintain at t 2 . It may be that initially caregivers found it a little stressful to remember to use new skills instead of the presumably wellpracticed harsh parenting, or that the training gave them greater insight into the importance of parenting, but that over time they became accustomed to new skills and perspectives; it is also possible that the measure was not stable in the South African context.
Differences between the intervention and control groups were, in general, small, and there were no differences on several variables. Something external to the trial may have caused changes in both groups, but there may be a number of other reasons for this pattern of results. While fidelity to content was maintained, fidelity to process might need improvement: Therapist skill in collaborative processes plays a key role in program effectiveness (Eames et al., 2009), but is harder to learn than content. It is encouraging that paraprofessionals (with only high school education) in a highly deprived community in Africa can successfully deliver a complex groupbased program. Future studies should explore under what conditions paraprofessionals can maintain fidelity to process. For instance, we consider weekly video supervision to be essential, to monitor fidelity and provide feedback and ongoing training (Axford et al., 2017); in future, these session videotapes could be coded to assess facilitator collaborative process skill learning.
Caregiver engagement is another area that may need attention. While much attention is paid to engaging families in high-income contexts (e.g., Axford et al., 2017), conditions in LMIC are different. In this trial, to mimic conditions that would prevail in program delivery in South Africa, we provided food and a small transport reimbursement, but not child care. Anecdotally, facilitators reported that caregiver alcohol abuse and winter weather inhibited attendance. Despite offering the program on Saturdays, in this context of limited, precarious jobs, we struggled to retain caregivers who gained employment. Future studies should explore reasons for nonengagement and ways to address them that are suitable for LMIC. For instance, embedding the program into existing systems with incentives for participation (such as cash transfers conditioned on attendance), in places of employment, providing other services to parents or providing more material support for attendance (e.g., transport and child care), may enhance attendance and thus effectiveness. We are exploring these possibilities in trials in the Philippines and Thailand, as well as in a factorial experiment in Eastern Europe.
Caregivers faced numerous adversities. It may be that under these conditions, a 12-session groupbased parenting program is too short to maintain higher levels of positive parenting and establish reductions in child conduct problems without additional support for other challenges. Future studies should investigate means of enhancing the program, for instance by adding material for dealing with IPV.
The program was designed to be as low-cost as possible. While some costs for any parenting program are unavoidable (e.g., paying facilitators, venue hire), we estimate that as part of routine service delivery costs of delivery may be as low as USD17 per family; training and supervising 20 facilitators are costed at USD20,000. Costs, however, vary depending on local rates of pay, numbers of families per group, and the number of times that trained facilitators deliver the program, and so on.
There are limitations to the trial. One possible reason for the small differences between intervention and control groups is that most measures used have not been validated in South Africa, although the internal consistency findings argue against this as problematic. Furthermore, there may be cultural variations in the way measures are understood: For instance, cultural understandings of parenting may have made the ECBI less sensitive to change in this context and thus unable to detect actual changes in child behavior; similarly, cultural expressions of depression may have meant that the BDI was not sufficiently sensitive. In addition, in the dense living environments of informal settlements, it is possible that there was contamination between intervention and control groups. Future trials should explore whether there is informal dissemination of program content and, if so, consider a cluster trial design. Further, there may have been a testing effect (Shadish, Cook, & Campbell, 2002): Repeated questioning about positive parenting techniques may have either caused a change in parenting itself or elicited stronger social desirability over time; a Solomon four-square design would be required to rule this out. The changes in the control group suggest that a testing effect, or some other variable external to the study, may have influenced parenting or child behavior, or both.

Conclusion
There are many strengths to this trial: It was carried out in extremely resource-poor areas; local paraprofessionals were trained to deliver the program; the recruitment target was exceeded, ensuring sufficient trial power; participants were followed for a year after the program ended with high follow-up rates; and observational assessments were used to supplement caregiver self-report. This was a stringent test of PLH for Children and shows that it holds promise as an intervention to support caregivers to learn nonviolent, positive parenting, and, with strengthening, potential for addressing child conduct problems. What remains is to examine how to strengthen that promise, given high need for such programs and demand from policy-makers (WHO, 2016).

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article: Appendix S1. Internal consistency of measures.