Physiological measurement of emotion from infancy to preschool: A systematic review and meta‐analysis

Abstract Introduction Emotion regulation, the ability to regulate emotional responses to environmental stimuli, develops in the first years of life and plays an important role in the development of personality, social competence, and behavior. Substantial literature suggests a relationship between emotion regulation and cardiac physiology; specifically, heart rate changes in response to positive or negative emotion‐eliciting stimuli. Method This systematic review and meta‐analysis provide an in‐depth examination of research that has measured physiological responding during emotional‐evoking tasks in children from birth to 4 years of age. Results The review had three main findings. First, meta‐regressions resulted in an age‐related decrease in baseline and task‐related heart rate (HR) and increases in baseline and task‐related respiratory sinus arrhythmia (RSA). Second, meta‐analyses suggest task‐related increases in HR and decreases in RSA and heart rate variability (HRV), regardless of emotional valence of the task. Third, associations between physiological responding and observed behavioral regulation are not consistently present in children aged 4 and younger. The review also provides a summary of the various methodology used to measure physiological reactions to emotional‐evoking tasks, including number of sensors used and placement, various baseline and emotional‐evoking tasks used, methods for extracting RSA, as well as percentage of loss and reasons for loss for each study. Conclusion Characterizing the physiological reactivity of typically developing children is important to understanding the role emotional regulation plays in typical and atypical development.


| INTRODUC TI ON
Emotion regulation, the ability to regulate emotional reactions to environmental cues, develops during the first years of life (Calkins, 1994;Eisenberg et al., 1995, Eisenberg et al., 1996Kopp, 1982;Thompson, 1994). The behavioral and cognitive constructs of emotion regulation have been extensively studied in the developmental psychology literature and suggest that personality, social competence, and problematic behavior have their origins in (or are influenced by) early emotional regulation (Calkins, 1994;Calkins & Keane, 2004;Cicchetti et al., 1991;Cole et al., 1994;Stifter et al., 1999). Yet, many behaviors of interest may be difficult to assess in young children who do not have the ability to communicate verbally. As such, physiological measurement is necessary to better understand age-related changes and individual differences in response to environmental challenges (Campos, 1976;Davidson, 2001;Lacey et al., 1963). One such index of physiological arousal is heart rate.
Measuring heart rate variability, the increase or decrease in time intervals between successive heart beats, has become an important component of psychophysiological research with the introduction of more accessible child and adult equipment and methodological guidelines Mailk, 1996).
Autonomic nervous system reactivity is assessed as the difference between resting (baseline) heart rate (or a calculated metric, such as heart rate variability) and heart rate during a physical or emotional challenge (e.g., still-face paradigm; Critchley et al., 2005;Jones-Mason et al., 2018). Early clinical uses of heart rate variability included identification of fetal distress (Hon, 1958;Lee & Hon, 1958) and contributions of the central nervous system to sudden cardiac death (Wolf, 1967). Lacey and Lacey (1958) were the first to describe how measurement of heart rate was sensitive to changes in one's environment, with deceleration associated with acceptance and acceleration associated with rejection of the environment (Lacey et al., 1963). From these early insights, we garnered an understanding that the autonomic nervous system (ANS) maintains homeostasis by facilitating responding to our internal and external environment (e.g., surprise; Bernston, Cacioppo, & Quigley, 1993;Mendes, 2009;Calkins & Marcovitch, 2010;Porges, 1985Porges, , 1992Porges, , 2007Porges, , 2011. These modifications are brought about by the sensory and motor neurons of the ANS, which connect the central nervous system to the internal organs and the endocrine system. The ANS itself is comprised of two systems, the sympathetic nervous system, which is responsible for our flight or fight response (to mobilize energy, accelerate heart rate, and slow digestion), and the parasympathetic nervous system, which is responsible for our rest and digest response (decelerate heart rate and increase blood flow to gastrointestinal organs to support digestion; Alkon et al., 2014;Sapolsky, 2004;Selye, 1956).
The tenth cranial nerve, or vagus, is believed to be responsible for maintaining homeostasis via bidirectional messages between the internal organs (including the heart and lungs) and the brain (Porges, 2011;Cacioppo & Bernston, 2011). As such, vagal tone, a measure of parasympathetic activity, is often used as an indicator of self-regulation (Porges, 1985(Porges, , 2007. Because vagal tone cannot be measured directly, various indirect indices are used. Respiratory sinus arrhythmia (RSA), as illustrated in Figure 1, is a measure of changes in heart rate due to respiration (heart rate increases during inhalation and decreases during exhalation; Zisner & Beauchaine, 2016). Decreased RSA represents parasympathetic nervous system withdrawal (resulting in heart rate increase), and increased RSA represents parasympathetic nervous system activation (resulting in heart rate decrease; Moore & Calkins, 2004). In healthy children and adults, heart rate variability (HRV) tends to occur within the respiratory frequency of 0.15-0.4 Hz at rest (i.e., high-frequency HRV; Wallis et al., 2005), although it can extend to frequencies between 0.15 Hz and up to 1.0 Hz for infants or adults when physically active (Bernston et al., 1997). To account for respiration changes across development, research that examines HRV or RSA in very young children often employ the respiration bandwidth filter of 0.24-1.04 Hz, which approximates 15-60 breaths per minute (Porges, 1985b;Shader et al., 2018).
A large body of work has suggested that HRV may serve as a physiological marker of emotion regulation in children (Beauchaine, 2015;Fox, 1989;Fox et al., 2000;Harper et al., 1977;Propper & Moore, 2006). A review of physiological measurement in healthy individuals has suggested that heart rate (HR) shows distinctive patterns of responding to different emotions (Kreibig, 2010). That is, HR increases during tasks designed to elicit negative (including anger, anxiety, fear, and sadness) and positive (including happiness, joy, and surprise) emotions, but decreases during tasks associated with passivity, including noncrying sadness, affection, contentment, and visual anticipatory pleasure. The studies reviewed by in Kreibig (2010) included "healthy individuals," but there were no limits surrounding age. Because research shows broad individual differences in how young children respond to emotion-evoking stimuli (Aureli et al., 2015;Buss et al., 2005;Calkins et al., 2007;Dale et al., 2011;Lewis et al., 2004;Quas et al., 2000), an examination of physiological reactivity in children aged 4 years and under is warranted. The purpose of this review is to provide an in-depth examination of research that has measured physiological responses during emotion-evoking tasks in children from birth to children age 4 years or younger. Specifically, we aim F I G U R E 1 An oversimplified demonstration of the effect of respiration on heart rate. Heart rate accelerates during inspiration and decelerates during exhalation to (a) describe patterns of ANS activity across different baseline tasks, (b) describe patterns of ANS activity across different emotion-evoking tasks, (c) describe relationships between behavioral and physiological responses (where available), and (d) conduct meta-analyses to evaluate the presence of predictable patterns of reactions to emotion-evoking tasks in children age 4 years or younger. . Search terms and strategy were refined in collaboration with a University of Alberta health sciences librarian and included combinations of search terms for emotion, physiology, and child. Our complete search strategies can be found in Appendix S1. The search results were imported into Covidence (covidence.org) for review, resulting in 2,598 articles following duplicate removal. Using the same search terms and databases, a second search was completed covering the dates between 7 March 2019 to 11 February 2020 to identify any additional articles that met inclusion criteria to ensure the systematic review and meta-analysis was updated prior to publication.

| Search strategy
Following duplicate removal, 121 additional articles were identified for potential inclusion.

| Screening for inclusion and exclusion criteria
To be included in the review, a paper had to (1) use an emotion-evoking task; (2) measure heart rate during baseline and emotion-evoking tasks; and (3) include a sample of typically developing children aged four years or less. A paper was excluded if (1) physiological measures were collected during exercise, surgery, medical treatment, sleep, or intervention; or if it (2) was a case study/case series or (3) was a review article, commentary, or conference abstract; or (4) did not include a sample of typically developing children.
Titles and abstracts of 2,719 articles were independently screened using the inclusion and exclusion criteria in Covidence by two authors (LRS and SR) to identify relevant studies that merited a full-text review. The reviewers had 97% agreement on article inclusion/exclusion and a third reviewer (VA) resolved disagreements (n = 66). The first author (LRS) completed a full-text review of the 362 articles that passed the initial screen, with 65 articles being selected for full-text extraction. The reasons for exclusion at the fulltext screen are listed in Figure 2.

| Data extraction
Two primary reviewers developed a standardized data extraction form to collect relevant information, including the year of publication, sample size, participant age and sex, baseline task, emotion task, method of heart rate collection and analyses, physiological results, behavioral coding analyses, relationship between behavioral and heart rate measures, and information on missing data/data loss.
The development of the structured data extraction form was an iterative process that allowed for flexibility and comprehensiveness in data extraction (Colquhoun et al., 2014).

| Statistical considerations
To simplify the results, all heart period values were converted to HR using the calculation HR (bpm) = 60,000/HP (value in msec; Fisher & Ritter, 1998), focusing on HR, HRV, and RSA. The effect of emotionevoking task on reactivity was determined by a difference score (i.e., baseline values of HR, HRV, or RSA were subtracted from emotionevoking task values of HR, RSA, or HRV).
Meta-regressions on physiological parameters were completed in Stata (Stata Statistical Software, Release 15; StataCorp LP, College Station, TX) using the metareg command (Palmer & Sterne, 2016), comparing age at assessment and physiological measurement (HR and RSA) during the baseline and emotion-evoking tasks. Weighting of each study was computed as the standard error (calculated during the meta-analyses, described below), with the results expressed as regression coefficients and 95% confidence intervals (CI). Meta-analyses on physiological parameters were completed in Stata using the metan command (Palmer & Sterne, 2016). Separate meta-analyses were conducted for HR, RSA, and HRV, with differences between emotion-evoking tasks explored using the subgrouping command. Cohen's d effect sizes (calculated using the following formula: d = M 1 − M 2 /s pooled where s pooled = √ [ (s 2 1 + s 2 2 ) ∕ 2 ]) and standard error were computed for each study (where data were available) and used in the meta-analyses, with 0.2-0.49 = small effect, 0.5-0.79 = medium effect, and ≥0.8 = large effect (Cohen, 1988). Heterogeneity was examined using confidence intervals (CI), the I2 statistic, and forest plots.
The I2 statistic, which ranges from 0% to 100%, is a measure of the variability in effect estimates due to heterogeneity between studies rather than chance (e.g., sampling error). Heterogeneity values are considered low at <25%, modest at 25%-50%, and high at >50%. Preliminary analyses suggested our meta-analyses had I 2 statistics > 50%; thus, we conducted random effects meta-analyses. Funnel plot, trim and fill analyses, and Egger's tests for small study effects were completed using the metafunnel, metatrim, and metabias commands in Stata (Palmer & Sterne, 2016) to investigate publication bias and heterogeneity through visual and statistical examination of the data (Egger et al., 1997).
Overall, 53 of the 64 articles were included in the meta-analyses. Of the remaining 11 articles, insufficient data were available to calculate an effect size for baseline and/or emotion-evoking task. Of the 53 studies included in the meta-analyses, data from 21 studies were included twice. The rationale for duplication was to explore task, age, and measurement effects broadly. Data from the same study were included if they compared baseline to emotion-evoking task (a) at different time points (e.g., ages; n = 9; Zeegers et al., 2017), (b) by characteristic (e.g., sex; n = 1; Eiden et al., 2018), (c) across different tasks (n = 6; e.g., Calkins et al., 1998a), or (d) provided both HR and RSA data (n = 7; e.g., . Note that one study (Calkins & Keane, 2004) provided data across different ages, tasks, and measurements.

| Ethical statement
Ethics approval was not required for this study.

| Data sharing
We have shared the tables generated from our review as .doc files for use by other authors (see Appendices S3-S6).

| RE SULTS
The systematic review and meta-analysis examining the relationship between physiological indices of heart rate during emotion-evoking tasks in children aged four years and under resulted in the inclusion of 64 articles. The Results section is organized as follows: (a) a descriptive overview of the included articles, with location and sample size, as well as age, ethnicity, and socioeconomic status of participants; (b) an overview of data collection and analyses, including electrode number and placement, RSA calculations used (if applicable), F I G U R E 2 Systematic review strategy using the PRISMA method (Moher et al., 2009) neutral events between tasks, and data loss; (c) descriptions of the baseline tasks; (d) descriptions of emotion-evoking tasks; (e) metaregressions on HR and RSA baseline and emotion-evoking tasks by age; (e) meta-analyses on HR, RSA, and HRV; and (f) associations between physiological measurements and behavioral coding. Due to the large number of included articles, citations are presented in their respective tables and referenced in the text by category, unless specifically described in the text.

| Overview of included articles
No language or publication date limits were placed during the search, yet all included articles were published in English and the earliest article meeting inclusion criteria was published in 1975 with the most recent published in 2019. The articles originated from the United States (n = 46), Europe (n = 9), Canada (n = 6), Israel (n = 2), and the Netherlands (n = 2), with 53 studies involving a cross-section (single time point) methodological design and the remaining 11 studies being longitudinal (multiple time points) in design. Sample sizes ranged from 12 (Ham & Tronick, 2006) to 278 (Zeytinoglu et al., 2019). Descriptive details of the included studies are presented in Appendix S2.
Briefly, of the 64 studies, 19 included children under 6 months of age, 33 included children between 6 and 12 months of age, 16 included children between 13 and 24 months of age, 10 included children between 2 years and 3 years 11 months, and 5 included children between 4 years and 4 years 11 months. The studies consisted of primarily Caucasian participants from middle-class backgrounds, with 13 studies not reporting on ethnicity descriptors and 24 studies not reporting on socioeconomic status.

| Number and placement of electrodes
When considering the number of electrodes used, 3% of studies used 4 electrodes, 49% of studies used 3 electrodes, 23% used 2 electrodes, and 25% did not report on the number of electrodes attached to the child. Of the studies that reported placement of sensors, 94% reported that they were attached to differing areas on the chest.

| RSA calculation
Of the 46 studies that reported the method of RSA or HRV calculation, 30 (65.2%) used Porges' method (i.e., respiration bandwidth of 0.24-1.04 Hz) or the use of MXedit software (incorporated Porges' method into its calculation of RSA; Porges, 1985a). Of the remaining 16 studies, 8 (17.4%) mentioned the use of respiration bandwidth in their calculations of RSA, 5 of which used the "infant bandwidth" of 0.24-1.04 Hz, and 3 using different bandwidths in their calculations (0.30-0.75 or 0.24-0.40

| Neutral events between tasks
Of the 64 studies that analyzed the effect of emotion-evoking tasks on physiological reactivity in children aged four and under, 53% (n = 34) did not mention the use of a "pause," "break," "intertrial interval," or "return to resting baseline" in their methodological descriptions. Of the remaining studies, 25% (n = 16) employed the face-to-face/still-face paradigm, in which cardiac measures during face-to-face play could be compared to those collected during reunion (i.e., a recovery period), although no mention of "pause," "break," "inter-trial interval," or "return to resting baseline" was found in their methodological descriptions. Fourteen studies (22%) did include a "pause," "break," "inter-trial interval," or "return to resting baseline" in their methodological descriptions. Of these, 11 mentioned including a short break, but did not provide nor use these data in their analyses; one included a "postrecovery phase," but did not provide or use the data in their analyses (Fracasso et al., 1994); one included an "inter-trial interval" to allow a return to baseline before the next task (but did not provide the data; Campos et al., 1975); and one included a 30-s interval between tasks, with data provided and used in analyses (Provost & Gouin-Decarie, 1979).

| Data loss and reasons
Of the 64 studies, 12.5% (n = 8) did not report the percentage of data loss nor the reasons for data loss. An additional 15.6% of studies (n = 10) did report percentage of data loss experienced, but did not provide reasons why data loss occurred. Of the studies that reported the amount of data loss (n = 56), 8.9% reported less than 10% data loss, 44.6% reported data loss between 11% and 20%, and 46.4% reported data loss greater than 20%. Reasons provided for loss of data included equipment failure, refusal to wear electrodes, artifacts in the data, child distress/refusal to continue, and human error.

| Baseline (resting) tasks
Baseline tasks varied in duration from 5 s (Anderson et al., 1999;Bohlin & Hagekull, 1993;Skarin, 1977) to 420 s , fell within five categories, as differentiated for the purpose of this review and categorized on the nature of the activities, described in Table 1, plus a category described as "baseline period" (n = 5) that ranged from 60 to 180 s.
The first category of baseline tasks involved sitting quietly without toys (n = 12) either alone or with the mother and ranged from 15 to 300 s. The second category involved a sedentary task (n = 5) in which the child played with toys alone, watched an examiner play with a toy, or read a book with their mother or the examiner for 60-420 s. The third category involved play with the mother (n = 15) and was often the first play episode of still-face or stranger approach or another type of play and ranged from 120 to 210 s. The fourth category included the period immediately before (n = 5) the emotion-evoking task or an inter-trial interval and ranged from 5 to 15 s. Finally, the fifth category consisted of watching a video (n = 22). Videos included "Spot"-a dog that explores his world; "Baby Mugs"-a parade of baby faces, drooling, giggling, yawning, etc.; "TikTak #15"-a series of short clips geared toward toddlers; "Sesame Street," "Barney the Dinosaur," "Baby Einstein," or other short unnamed videos, ranging from 45 to 300 s. Watching a video was separated from sedentary tasks as it involved screen time, rather than solitary play with toys or with a second person, and thus may have a different unknown effect on heart rate.
Because children may have varied (or nonprobed) reactions to probed emotions (e.g., laugh at a scary stimuli), the results will be summarized based on task used (e.g., arm restraint) rather than probed emotion (e.g., distress).The emotion-evoking tasks are described in Table 2, the methods used to collect physiological data are presented in Table 3, and the results of the physiological tasks are presented in Table 4.

Distress
Arm Restraint Caregiver (Calkins & Fox, 1992)  Mother stood behind child and gently grasped child forearms and firmly hold them to their side while an attractive toy was placed directly in front of child; Recovery period followed a second trial with child playing with toy or comforted by parent Mothers instructed to gently hold child's arms to their sides of the child and then to release the arms while maintaining a still-face with no verbal interactions.
Both trials 90 s Stone and Porter (2013) Distress Arm Restraint-Modified 3 During toy removal task, mother engaged her child in play with an interesting toy; mother then held toy out of child's reach, retaining eye contact but silent with still-face. Mother next gently restrained her child's arms against his/ her sides while maintaining a still-face and silence.
Toy play 30 s; toy removal and arm restraint 120 s

Arm Restraint and Toy Play
Child was first presented and encouraged to play with an attractive toy; experimenter stood behind child, placed hands on the child's forearms and moved them to child's sides and held them while maintaining a neutral expression; After first trial, child was allowed to play with the toy again followed by a second arm restraint. The child was again allowed to play with the toy after arm restraint Distress

Strange Situation Modified
Play with parent, followed by a brief separation from mother, and then a reunion Play 600 s; Separation 180 s; Reunion 300 s

Teddy Bear Picnic
Two costumed characters (the "Birthday Lady" and the "Teddy Bear"): Birthday Lady encouraged children to sit around picnic mat and play with plastic food and parents return to sofa in room. The Teddy Bear entered, pausing in doorway until each child had seen him. Birthday Lady invited Teddy Bear to sit down near picnic blanket, and he offered each child a piece of plastic birthday cake. Birthday Lady and Bear then danced while singing "Round and Round the Garden" before offering each child, with the help of their parents, the opportunity to dance with Bear. Birthday Lady instructed families to let children play in any way they would like and then she left the room. . Fear

Strange Situation-Ainsworth
Child seated in high chair and completed seven episodes: mother and child together, stranger enters room with mother and child, mother leaves stranger and child alone, stranger leaves and mother and child together, mother leaves and child is alone, stranger re-enters with child, and stranger leaves and mother returns with child.  Mother leaves room and unfamiliar experimenter enters and placed robot 1.5 m away from child; Experimenter makes robot approach child, stopping 15 cm from child, while making movements with its arms and emitting noise. The robot then walks backward and stops at back of room for 10 s before moving forward again; This was repeated three times

Frustration Frustrating
Puzzle Task Child given a wood toy with many holes with string laced through the holes (middle of string was glued to inside of toy, making it impossible to untangle completely). Experimenter asked child to untangle toy while he/she worked on paperwork in other room. The experimenter left the room and upon return, experimenter presented second unglued puzzle to child and allowed child to completely unlace string and solve puzzle

Frustration
Green Circles Experimenter repeatedly asks child to draw circles with green marker. Experimenter criticizes child's circles but does not say how to do better. Experimenter continues to prompt, "I need the perfect green circle" for the duration of the task

s Blankson et al. (2012)
Frustration High Chair Experimenter placed child in high chair and told child to wait for a special toy; mother sat nearby with magazine and responded normally to child if child spoke to her but did not remove child from chair 300 s Calkins and Johnson (1998b)

Guilt Mishap Guilt Paradigm
Child presented with tower, which experimenter says is her favorite toy and had made it herself; She told child that she would share it as long as they were very careful. Because tower is rigged, it fell apart as soon as child began to handle it; Experimenter then says "Oh my" with regret and sits still in front of child with her face covered with her hands She asked, "What happened?", "Who did it?", and "Did you do it?"; Child is told that it was not their fault and there was a problem with tower. She gives partially built tower to child and asks child to help her make it; Experimenter tells child damage was not their fault and assumed responsibility for it Positive

Absurd Event
Research assistant showed parents two ordinary events (narration of playing with a ball/drinking from a cup and read a book) and two absurd events (ball worn as a clown nose and continuously poked while saying "beep" and book/cup worn like hat and continuously raised and lowered while saying "zoop"). Each absurd event was presented twice, once with parents holding neutral affect and once with positive affect (i.e., smiling and laughing).

Emotion-Evoking Task
Mothers instructed to turn toward experimenter while experimenter enacted script directed toward them in angry, excited, or neutral tone of voice; Same script used for each emotion; Mothers were instructed not to respond in any way Episodes each 60 s Moore (2009) Positive, Fear Audio-ID speech Mothers expressed either comfort, surprise, or fear as they said "Hey, honey, come over here" to their children; Conditions were constructed such that samples expressing that emotion were played in random order Episode each 60 s with 30 s inter-trial pause Santesso et al. (2007) Positive, Fear Peek-a-boo then scary mask Child and mother played peek-a-boo; After child was positively engaged, mother called child name and appeared from behind screen wearing a full-face mask; She then returned behind screen and repeated procedure; On second trial, stranger wore mask, after which, mask was removed and stranger approached child NA Vaughn and Sroufe (1979) Positive, Fear, Frustration

Strange Situation and Toy Box
Six episodes: (1) exploration-child explores with mom in room, (2) play with mom-mom shows child how pull toy works, (3) frustration-mom puts toys in box and restrains child on his back, (4) reaction to stranger-experimenter approaches child, (5) isolation-child is left alone in room, and (6)

Relationship(s) with
Behavior HR

| Meta-analyses on physiological responses to emotion-evoking tasks
3.6.1 | Heart Rate (HR) A total of 24 studies measuring HR were included in a meta-analysis.
There was a large heterogeneity effect among the included studies (I 2 = 92.1%); thus, we adopted a random effects model to pool the relevant data and explored subgrouping analyses to determine any differential effects of the type of emotion-evoking task on HR. As shown in Figure 5, five of the eight tasks (positive and negative stimuli, toy block/removal, face-to-face/still-face, stranger situation, and guilt paradigm) produced significant effects (all ps < 0.005), all resulting in an increase in HR compared to baseline. In contrast, three tasks (absurd event, classical music, and sad videos) did not have a significant effect on HR (all ps > 0.38), with no clear pattern of increasing or decreasing HR relative to baseline. Funnel plot analyses on Cohen's d ES for HR demonstrated asymmetry, suggesting that bias was present ( Figure 6).
The trimmed set of data systematically removed each "outlier" one at A total of 32 studies measuring RSA were included in a meta-analysis.
There was a significant effect of emotion-evoking task, suggesting that tasks produced a decrease in RSA compared to baseline (Cohen's d ES = 0.39, 95% CI = 0.30-0.48, z = 8.42, p < .001, Figure 7). There was large heterogeneity among the included studies (I 2 = 94.2%); thus, we adopted a random effects model to pool the relevant data and explored subgrouping analyses to determine any differential effects of the type of emotion-evoking task on RSA. As shown in Figure 7, five of the seven tasks (arm restraint, puppet play, face-to-face/stillface, stranger situation, and toy block/removal) produced significant effects (all ps < 0.05), all resulting in a decrease in RSA compared to baseline. In contrast, two tasks ("sad videos" and "positive and negative tasks") did not have a significant effect on RSA (all ps > 0.13).

| Heart Rate Variability (HRV)
Two separate meta-analyses were completed for HRV data. The first included four studies that measured HRV in msec or sec (Figure 9a), and the second included two studies that measured HRV in msec 2 /Hz (Figure 9b).
The first meta-analysis produced a significant effect of emotion-evoking task, suggesting that tasks produced a decrease in HRV compared to baseline (Cohen's d ES = 0.21, 95% CI = 0.01-0.40, z = 2.09, p = .037, Figure 9a). There was large heterogeneity among the included studies (I 2 = 92.9%); thus, we adopted a random effects model to pool the relevant data and explored subgrouping analyses to determine any differential effects of the type of emotion-evoking task on HRV. As shown in Figure 9a, two of the three tasks (arm restraint and stranger situation) produced significant effects (all ps < 0.017), with both resulting in a decrease in HRV compared to baseline. In contrast, one task (sad videos) did not have a significant effect on HRV (p = .37). The second meta-analysis produced a significant effect of emotion-evoking task, suggesting that the task produced a decrease in HRV compared to baseline (Cohen's d ES = 0.15, 95% CI = 0.07-0.22, z = 3.87, p < .001, Figure 9b). There was no heterogeneity between the two studies (I 2 = 0%). Due to the small number of studies included, funnel plot and Egger test were not performed.

| Relationships between physiological measurement and behavioral coding
In addition to collecting electrocardiograph (ECG) data, the majority of studies also coded affective behavior in response to the emotionevoking tasks. Most studies included coding of observed behavior in their protocols, as summarized in Table 3. Behavioral coding included verbal responses (e.g., laughing, signs of distress), physical reactions (e.g., pushing away, covering face), body tension/posture, facial affect (e.g., smiling, frowning), gaze (e.g., look at object, look to mother), motor reactivity, regulatory behaviors, duration/frequency of crying, signs of distress or fear, and touch.
HR was positively associated with: 1. wariness during stranger tasks (Anderson et al., 1999;Bohlin & Hagekull, 1993) 2. gaze aversion during stranger tasks (Waters et al., 1975) 3. negative affect during stranger tasks (Buss et al., 2005;Campos et al., 1975;Ham & Tronick, 2009) 4. vocal distress during Teddy Bear picnic (Hay et al., 2017) 5. gaze to parent during an absurd task in 6-month-olds (Mireault et al., 2018) 6. social engagement during play (Ham & Tronick, 2009) 7. protest behavior during a still-face paradigm (Ham & Tronick, 2009) 8. negative affect during reunion (Moore & Calkins, 2004) HR was inversely related to: F I G U R E 6 Funnel plot of studies examining heart rate (HR) reactivity during an emotion-evoking task. Note: A paper comparing children at different ages or across different tasks may appear more than once F I G U R E 7 Meta-analyses on studies examining respiratory sinus arrhythmia (RSA) reactivity during an emotion-evoking task. Note: A paper comparing children at different ages or across different tasks may appear more than once. Abbreviations: ES = Cohen's d effect size; * paper used high-functioning heart rate variability (HR-HRV) to calculate RSA 1. fear during arm restraint (Cho & Buss, 2017) 2. negative affect in a still-face paradigm (Moore, 2009) HRV suppression was inversely related to: 1. regulation during clips of positive and negative stimuli (Eiden et al., 2018) The relationships between physiology and observed behaviors described in the 64 studies are presented in Table 4.

| D ISCUSS I ON
The present review summarized the results of research that has examined physiological measurements of emotional regulation in children age 4 years or younger. Three measures of cardiac activity were reported here for use in developmental psychobiological research in young childrenheart rate, heart rate variability (HRV), and respiratory sinus arrhythmia (RSA). The review had three main findings. First, meta-regressions exploring the relationship between age and physiological measurement during baseline and emotion-evoking task showed a significant effect of age for both baseline and task during measurement of heart rate and RSA. Second, the three meta-analyses on the impact of emotion-evoking task on physiological measurement resulted in significant Cohen's d effect sizes, with a resulting increase in heart rate from baseline, and a resulting decrease in HRV and RSA from baseline. Third, physiological measurement was related to observed behaviors (e.g., facial affect and gaze) during emotion-evoking tasks. Having a good understanding of physiological and age-related changes of typically developing children for both baseline and emotion-evoking tasks is vital when understanding the role of emotional regulation in typical and atypical development.

| Review of the findings
Physiological measurement of responses to an emotion-evoking task typically begins with a baseline (resting) period. This period is intended to reflect the individual's innate capacity to regulate their emotions (Appelhans & Luecken, 2006), such that higher baseline heart rate variability (higher PNS activity) reflects more regulatory capacity (Propper & Moore, 2006) and is associated with positive psychosocial outcomes (Scarpa et al., 2010). An overview of the baseline tasks reviewed here indicated variability, both in terms of duration, from 5 (Anderson et al., 1999;Bohlin & Hagekull, 1993;Skarin, 1977) to 420 s , and task (including sitting quietly, playing alone or with mother, time immediately before or between trials, and watching short videos, and "baseline period"  Kreibig (2010), who noted that studies examining fear, anger, or anxiety (as probed by stranger situation, still-face, arm restraint, or toy removal/block in the studies reviewed here) reliably produce an increase in HR or a decrease in RSA and HRV from baseline in children 6 months of age and older. Of the studies that did not have an effect, or produced a discordant result (decrease in HR, increase in RSA or HRV from baseline), the participants were either under 6 months of age (Anderson et al., 1999;Bazhenova et al., 2007;Mireault et al., 2018), or the task or had an element of passivity and may not have been sufficiently salient to produce reactivity (e.g., classical music, infant directed speech, puppet play, narrated comic strip; or watching a sad video; Calkins et al., 2007;Calkins & Keane, 2004;Cho & Buss, 2017;Liew et al., 2011;Noten et al., 2019a;Schmidt et al., 2003;Wagner et al., 2018b). Passivity was identified by Kreibig (2010) as having differential impacts on reactivity.

F I G U R E 9
Meta-analyses on studies examining respiratory heart rate variability (HRV) during an emotion-evoking task for (a) HRV calculated in milliseconds or seconds and (b) HRV calculated in milliseconds/Hz. Note: A paper comparing children at different ages or across different tasks may appear more than once. Abbreviations: ES = Cohen's d effect size Of importance, there were instances in which studies had no effect (confidence interval of the effect size crossed zero) but did not conform to the above reasons. Each of these studies was explored to determine whether methodological differences could explain their discrepant results. Although Perry et al. (2016) did not find a difference between baseline and task in 10-month-old infants when assessing arm restraint, this may be explained by the role of the mother. The methods described that the mother first played with toys with her child, followed by a game of peek-a-boo, and then engaged in arm restraint. It is possible that infants interpreted this as another game due to the preceding play, giving a different tone to the "negative" arm restraint task. Two studies that used "stranger situation" also found discordant results. An examination of their methods suggested that the participant pool of Brooker et al. (2013) consisted of 6-month-old twin pairs, who have different developmental trajectories due to variations in gestational age. Zeegers et al. (2017) identified that they did not use age-adjusted respiration rates in their analyses. Finally, although Fracasso et al. (1994) did not describe the specific emotion-evoking task used, they reported that "a variety of stimuli designed to elicit both positive and negative emotions" (p. 279). Because the emotion-evoking tasks were not described, an evaluation based on methodological differences cannot be provided, beyond considering that combining positive and negative values potentially washed out any differential responses (Kreibig, 2010).
In addition to describing age-related changes in HR and RSA, we also detailed relationships between physiological measurements and observed behavioral responses. For example, increases in heart rate were associated with increased wariness, gaze aversion, and negative affect during stranger situation and decreases in RSA were associated with increased negative affect in the still-face paradigm.
Exploring relationships between external (behavioral) and internal (physiological) reactivity may provide important insights into overall emotional regulation. Calkins and Dedmon (2000) noted that physiological and behavioral regulation may be interdependent components with changes to emotional reactivity apparent on both physiological and behavioral levels. If observed behavior and physiological reactivity are indeed interdependent, one would expect to find a significant association between behavioral and physiological reactivity for a majority (if not all) of the studies in this review that included behavioral coding. Yet several did not report any such significant relationship. Identifying congruence and incongruence between behavioral and internal reactivity is important, especially when one consider atypical populations. For example, heart rate and facial affect were compared in 4-6 year old typically developing children and children with autism spectrum disorder (ASD) who were presented with a fear-inducing robot. Both the typically developing children and children with ASD showed task-related heart rate changes (and did not differ from one another) but only the typically developing children showed changes to facial affect reflecting a fearful response (no affect changes were seen in the children with ASD; Zantinge et al., 2019). This example is particularly illustrative of the importance of describing associations between internal and external reactivity and how they may differ in childhood disorders.

| Physiological reactivity
Emotional regulation is an adaptive skill that children develop during their early years (Calkins, 1994;Eisenberg et al., 1995;Eisenberg et al., 1995;Kopp, 1982;Thompson, 1994) and undergirds components of personality, social competence, and externalizing and internalizing behavior (Calkins, 1994;Calkins & Keane, 2004;Cicchetti et al., 1991;Cole et al., 1994;Stifter et al., 1999). If this process is interrupted, emotion dysregulation may contribute to the development of psychopathology (Calkins & Dedmon, 2000;Calkins & Fox, 1992;Keenan, 2000;Shaw, Keenan, Vondra, Delliquadri, & Giovannelli, 1997). The polyvagal theory is one framework for understanding how the PNS fosters adaptive engagement with the environment (Porges, 2007), and authors of several studies in this review explored this framework within their research. The direction and magnitude of PNS activity can help differentiate how stimuli are interpreted (termed neuroception). Small-to-medium increases in RSA, for example, connote that the individual interprets the environment as socially engaging and safe (i.e., rest and digest), whereas small-tomedium decreases in RSA (as identified by the studies in this review that used RSA as their physiological measure) connote that the individual interprets the environment as threatening, thus preparing the body for action (i.e., fight or flight; Appelhans & Luecken, 2006;Calkins & Keane, 2004;Hastings et al., 2014;Porges, 2007;Porges et al., 1996). These states are mutually exclusive-if you do not feel safe, you are more likely to be in a physiologically aroused state, affording less time for social engagement-which can have cascading impacts on development. As such, understanding how typically developing children respond to various environmental challenges, such as those modeled in the emotion-evoking tasks described in this review, will allow us to better understand atypical development.
F I G U R E 1 0 Funnel plot of studies examining heart rate variability reactivity (from A above) during an emotion-evoking task. Note: A paper comparing children at different ages or across different tasks may appear more than once This is important because atypical emotional responses and dysregulation may have implications for functional outcomes including language and social skills (Carpenter & Tomasello, 2000;Mundy & Sigman, 2006;Woods & Wetherby, 2003).

Important to understanding emotion reactivity is delineating if
emotions are associated with differential reactivity. Feldman-Barrett (2006) contends that emotions do not have unique autonomic signatures, but rather may denote differences between positive and negative states. The ANS is activated in response to actual or expected behavior, and because behavior is not emotion-specific, Feldman-Barrett (2006) contends that emotion-specific ANS patterns are improbable. Furthermore, ANS differences between emotions are viewed as dimensional differentiation, with mediators purported to explain the heterogeneity of findings in meta-analyses (Feldman-Barrett, 2006). Cacioppo et al. (1997) contend that there is a degree of differentiation between emotions, with differences in valence-specific tasks (e.g., negative versus positive) producing more consistent results than emotion-specific (joy, fear) tasks. In contrast, Stemmer has posited that different emotions have inherently different goals (e.g., fear to escape situation, happiness to engage in situation) and thus should have different ANS responses (Stemmler, 2004(Stemmler, , 2009). Our findings examining physiological measurement in children aged 4 and under support Feldman-Barrett's (2006) and Cacioppo et al., (1997) position.
When looking at the pattern of HR, RSA, and HRV data in this review, there appears to be support for Stemmer (2004,2009) supposition of valence-specific task differences, but as addressed above, these dif-

| Considerations for data collection and analyses
When assessing emotions in young children, it is important to determine whether the task is producing the desired response (i.e., does a "negative" task produce negative emotional responses in the child?).
For example, in the laboratory temperament assessment battery (Lab-TAB; Goldsmith & Rothbart, 1996), children are presented with four masks, an evil queen (from Snow White), a glow-in-the-dark vampire, an old man, and a gas mask. These masks are designed to elicit a fear response and the scoring of this task reflects this, coding for both fear and sadness. Although some children may show these affective responses when presented with these masks, some children may react by smiling or laughing. In addition, there could be a group of nonresponders (for observed affect) who show a cardiac response and another group of nonresponders (for observed affect) who do not. This inter-individual variability could wash out general associations.
Individual variability may also be impacted by psychomotor activity. It may be helpful for researchers to note the effects of movement, as well as postural differences and supports required for infants, toddlers, and preschoolers as a part of their data collection and analysis. The studies reviewed here made little mention of the impact of movement, other than editing or removing movement artifacts in their data, nor of the relative contributions of postural support. Data that require editing due to movement or postural instability (i.e., sections of the data that are edited or removed from analyses) may be confounded by psychomotor activity and thus may under-or over-estimate the impact of emotion-evoking stimuli (Bush et al., 2011).
Another consideration for physiological measurement when examining age-related changes is body size. Body size was not mentioned in the studies, nor body size included as a factor in analyses of HR, RSA, or HRV. The allometric law of mammal's states that as body size increases, heart rate decreases (Meijler, 1985). Because basal cardiac activity reflects the ability of an individual to utilize their ANS to appropriately respond to environmental challenges (Porges, 2007), future research will need to consider body size in addition to age when assessing cardiac responses in early childhood, as rates of obesity are increasing in children (World Health Organization, 2018), and when assessing children with developmental disabilities, including autism spectrum disorder, in whom weight-related challenges may be particularly common (Levy et al., 2019).
Finally, when studying various emotion-evoking tasks and comparing reactivity across tasks, such as puppets, toy play, toy removal, and others included in Lab-TAB (Goldsmith & Rothbart, 1996), it may be important to include a neutral event between tasks to ensure reactivity in one task does not bleed into or influence reactivity in following task. This is especially important when examining the differential roles of SNS and PNS activity, in which a period of recovery from stress is warranted (Suurland et al., 2017). In light of this, unexpected responses and variability should be reported for both positive and negative tasks, and a neutral event should be included between emotion-evoking tasks to reduce potential carry-over effects that may influence variability, as well as at the end to allow for SNS and PSN comparison and recovery.
With the findings of the review and the above considerations in mind, we recommend the following for future physiological studies of emotion reactivity in children. First, there should be a baseline period of a minimum of 30 s and preferably two minutes prior to the onset of an emotion task, to allow for data loss due to artifacts, movement, possible electrode removal by the child, and distress.
Second, the emotion-evoking task should be a minimum of 30 s (again, longer durations are preferred to allow for data loss). Third, any study that presents more than one emotion-evoking task should include a recovery period between tasks to allow for a return to baseline and reduce the likelihood of reactivity in one task bleeding into reactivity related to subsequent tasks. Fourth, a recovery period should be included at the end of the emotion task protocol to allow for PNS and SNS recovery and comparison. Fifth, if RSA (or other HRV) is calculated, age-corrected respiration frequencies should be employed and clearly stated in the methods section.
Sixth, behavioral coding (e.g., affect, gaze, vocalizations) should be included (where appropriate to research question) to allow for an examination of the association between physiological and behavioral indices of reactivity. Seventh, body measurements (height, weight, BMI) should be collected and included in analyses to control for the impact of body size on physiological reactivity. Eighth, the activity of the child during baseline and emotion task should be measured and included in analyses to control for movement.

| CON CLUS ION
This methodological review examined physiological responses to emotion-evoking tasks in children age 4 years or younger. Although we aimed to be inclusive by searching four databases without language or publication date restrictions, we cannot discount that articles may have been missed that were not indexed in our chosen databases. Furthermore, in our aim to be inclusive of all available data, the results from some studies were included more than once in the analyses. We recognize that this is a bias and potential limitation of the analyses.
This review summarized the results of research that has examined physiological measurement of emotional regulation in children age 4 years or younger. Overall, heart rate showed an age-related decrease for both baseline and emotion-evoking tasks whereas RSA showed an age-related increase for both baseline and emotion-evoking tasks, regardless of task valence. This review was both necessary and timely to review existing literature for assessing internal responses to emotion in childhood, as emotion regulation has implications for later functional outcomes (Carpenter & Tomasello, 2000;Mundy & Sigman, 2006;Weiss et al., 2014;Woods & Wetherby, 2003), and is a focus of much recent research attention, including in children at risk for mental health disorders (e.g., Eiden et al., 2018) and neurodevelopmental disorders (Sheinkopf et al., 2019). Understanding the potential differences that may arise due to methodological and analytical differences can inform future studies as researchers continue to investigate emotional regulation and reactivity differences in typically developing children and children with an atypical presentation.

ACK N OWLED G EM ENTS
The authors would like to thank Canadian Institutes of Health Research (CIHR), Brain Canada, and the Azrieli Foundation for funding our research.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to disclose.

AUTH O R CO NTR I B UTI O N S
LRS, SR, and VA reviewed the literature and approved articles to be included in the review. JAB, AK, IMS, and LZ provided guidance to the research question, search terms, search strategy, technical advice on statistical analyses, and provided constructive feedback on the first draft. LRS did the statistical analyses and wrote the first draft of the manuscript. All authors approved the final draft prior to submission.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1002/brb3.1989.