Capturing touch in parent– infant interaction: A comparison of methods

Naturally occurring high levels of caregiver touch pro-mote offspring development in many animal species. Yet, caregiver touch remains a relatively understudied topic in human development, possibly due to challenges of measuring this means of interaction. While parental reports (e.g., questionnaires, diaries) are easy to collect, they may be subject to biases and memory limitations. In contrast, observing touch in a short session of parent– child interaction in the lab may not be representative of touch interaction in daily life. In the present study, we compared parent reports (one- off questionnaires and diary) and observation- based methods in a sample of German 6- to 13- month- olds and their primary caregivers ( n = 71). In an attempt to charac-terize touching behaviors across


| INTRODUCTION
Touch is often referred to as the earliest sense to develop (e.g., Fulkerson, 2014) and an important means of contact between an infant and their caregiver (Hertenstein, 2002). Studies suggest that tactile stimulation provided by the caregiver is crucial for the offspring's well-being, both in rats (e.g., Parent et al., 2017;Suchecki et al., 1993) and in monkeys (e.g., Harlow & Zimmermann, 1959;Simpson et al., 2019). An important insight coming from this animal research is that there is significant individual variation in parent touching behavior. This variation has consequences for the offspring's development, affecting domains such as behavioral fearfulness (Caldji et al., 1998), immune system response (Parent et al., 2017), exploratory behavior (Guardini et al., 2016), and even susceptibility to drug use (Francis & Kuhar, 2008).
Considering the apparent significance of touch in the first months of life, it is striking how little research there is on the specific effects and mechanisms through which it shapes human infant development. One explanation lies with the practical and ethical challenges associated with studying human caregiver touch. One clear difference between measuring caregiver touch in nonhuman animals and in humans is that in the former case, researchers are able to observe the participants continuously. Touching behaviors in animals are easily identifiable-for example licking/grooming, and archedback nursing (LG-ABN) in rats (Caldji et al., 1998)-and can be quantified over long periods of time. This results in representative estimates of caregiver tactile stimulation and can be used to accurately identify caregivers who engage in low or high levels of contact. In addition, much of the evidence for the important role of touch in development comes from experiments which employed cross-fostering, a practice in which rat offspring of low-LG-ABN mothers are artificially assigned to be fostered by high-LG-ABN mothers, and vice versa (Francis et al., 1999). While cross-fostering is an elegant example of a study design allowing for robust inferences about the impact of maternal touch-related behaviors in infancy, for obvious reasons such studies are not possible with humans. When aiming to examine correlates of parental touch-related behaviors, especially when the focus is on patterns occurring over longer periods of time, researchers studying human development have much more limited options.
Several studies with human participants have looked into populations where it has been documented that caregiver touch is minimal, such as infants in institutionalized care (Maclean, 2003) and infants of depressed mothers (Field, 2001). In studies employing this approach, caregiver touch was assumed to be reduced, but was not actually quantified. Some studies with human infants have attempted to experimentally manipulate caregiver touch, an approach most notably exemplified by studies examining effects of touch-based interventions, that is, Kangaroo Care (Cong et al., 2011;Feldman et al., 2002) and baby massage (Field et al., 2006;Gitau et al., 2002). However, these investigations almost exclusively feature babies born prematurely, as the authors were particularly interested in helping these babies from a medical perspective. It is therefore hard to generalize the results of such studies beyond the specific atypical populations and rather extreme tactile experiences investigated.
While studies with infants of depressed mothers, those born prematurely, and infants in institutionalized care provide invaluable insights into the role that tactile deprivation and tactile enrichment play in early development, which are especially informative about atypical populations, it is important to understand whether naturally occurring variations in everyday caregiver touch are consequential for development in the general infant population, as has been found in animal work (Gliga et al., 2019). A variety of methods have been used to capture the amount and nature of touch in parent-child interaction. These methods differ in whether they are subjective, like parent-report measures, including questionnaire (Koukounari et al., 2015) and diaries (Barr et al., 1988;Lam et al., 2010), or objective, as for example measures coded from recordings of parent-child interactions Feldman Singer, & Zagoory, 2010;Reece et al., 2016). They also vary in how often these measures are taken (i.e., one-off questionnaires or diaries), and the length of recorded observation. Methods most commonly employed for this purpose are discussed below. Parent-Infant Caregiving Touch Scale (PICTS). To the best of our knowledge, the Parent-Infant Caregiving Touch Scale (PICTS; Koukounari et al., 2015), which measures self-reported frequency of specific touch-related caregiving behaviors, is the only parental questionnaire currently used to assess caregiver touch given to infants. It is a short, 12-item scale designed to capture commonly occurring parental behaviors. Four items refer to stroking of different body parts, and the rest are about other forms of touch and communication: picking up, cuddling, rocking, kissing, holding, talking to, watching, and leaving the baby to lie down. Parents are asked to indicate how often they engage in those behaviors by choosing a level on a 5-point Likert scale ranging from Never (1) to A Lot (5). While this questionnaire is simple, it also has good psychometric properties. Koukounari et al. (2015) found its internal reliability at 5 and 9 weeks to be very good. Interestingly, PICTS scores were not related to other measures of caregiving quality such as maternal sensitivity (as rated from parent-child interactions). While the authors took it to mean that touch has a distinct function in parent-child interaction, this lack of correlation could also raise questions about the validity of this scale. As a self-report measure, it could be subject to "faking good," or performing for the researcher (Field, 2019), with parents reporting inflated levels of caregiving behaviors. Nevertheless, stroking, operationalized as the "stroking" factor in the PICTS scale (composed of the four items asking about stroking baby's arms/legs, back, face, and tummy), has been reported to have buffering effects on developmental outcomes of children whose mothers experienced pregnancy-specific anxiety (PSA), in that high levels of stroking in infancy significantly reduced the effects of PSA on internalizing and externalizing scores at 3.5 years (Pickles et al., 2017). Moreover, a recent study found a moderating effect of parental stroking on 9-month-olds' heart rate response to gentle stroking-the more stroking the parent reported in the PICTS questionnaire, the larger were the immediate decelerating effects of stroking on baby's heart rate (Aguirre et al., 2019). The mechanisms behind these effects are likely similar to the stress-buffering effects of licking and grooming in rodents (Suchecki et al., 1993), but much more research is needed before we fully understand these phenomena in human infants.

| One-off parent-report questionnaires
The Social Touch Questionnaire (STQ; Wilhelm et al., 2001) is a questionnaire originally designed to measure attitudes and affects toward social touch, with a focus on capturing potential anxiety and embarrassment associated with it. The STQ consists of statements about experiences of touch with both close, familiar people (e.g., As a child, I was often cuddled by family members) and strangers (e.g., I would rather avoid shaking hands with strangers). Participants are asked to indicate how characteristic or true each of the statements are of them on a 0-4 scale (from "not at all" to "extremely"). Higher STQ scores reflect more anxiety and embarrassment and less positive experiences with social touch. Previous work (Aguirre et al., 2019) found an association between infant physiological reactions to touch and parental attitudes toward touch. This raises the possibility that parental attitudes may be a reliable predictor of parents' use of touch including in parent-infant interaction.
Although previous work suggests that both PICTS and STQ may be valid measures of parental touch, to date no study validated them against objective measures of caregiver touch.

| Diaries
Another approach to measuring caregiver touch through parental self-report is the use of diaries, either in paper or in electronic (online) form. Such diaries commonly ask parents to record caregiving (e.g., holding) and/or infant (e.g., crying) behaviors over a period of a couple days (Barr et al., 1988;Lam et al., 2010). Thus, one advantage of diaries is that they provide a record of behaviors of interest over a certain period of time, typically around a week, potentially resulting in estimates more representative of everyday behavior patterns across a variety of contexts than ones collected at a single time point, while being sensitive to day-to-day differences in caregiving behaviors. In addition, diaries differ from one-off questionnaires like the PICTS in that they typically ask about the durations of certain behaviors in terms of minutes or hours. Some have claimed that diaries do provide accurate gauges of the frequency and duration of behaviors of interest, while being relatively easy to use for both the parents and the researchers (Lam et al., 2010), but those claims have not been supported by validation with independent measures.
However, diaries have also been reported to be onerous for participants, with some participants indicating that they do not have time for their completion, and others just not following through with their participation, consequently yielding response rates that often do not enable conclusive analyses (Nicholl, 2010). It also remains unclear to what extent event duration estimations obtained from diaries are accurate. These concerns are most likely the reason why very few studies on caregiver touch to date have used diaries. One exception is a recent study (Moore et al., 2017) on the associations between caregiver touch in infancy and epigenetic signatures at 4-5 years of age, focusing on genes associated with social bonding and postnatal plasticity, where they found no statistically significant correlations between postnatal contact and candidate genes. Considering the abovementioned concerns about diaries, such studies can be hard to interpret, and learning more about diaries in terms of their psychometric properties would certainly help shed more light on results such as the ones observed by Moore et al. (2017).

| Observing parent-child interaction
The most common way of measuring caregiver tactile contact with their baby is within some sort of a parent-child interaction (PCI) setup, where the behavior of the dyad is filmed and later video-coded for events of interest. The straightforwardness of this method makes it very attractive, as researchers are able to directly observe the caregiver behaviors they are interested in, without having to rely on the accuracy of parent self-report. PCI-derived measures also enable flexibility with regard to the behaviors of interest, allowing researchers to choose a coding scheme that best reflects their interests.
Most commonly, touch in caregiver-infant interactions is measured within a free play setting, including face-to-face setups where infants are sat in a car seat with mothers sat opposite them (Feldman, Singer, et al., 2010;Moreno et al., 2006;Stack & Muir, 1992) or interactions on the floor, where parents are free to position the infant however they please (Feldman et al., 2003;). The instructions given to parents are usually aimed at evoking naturalistic interactions, with phrasings such as "Play freely" (Feldman, Singer, et al., 2010), "Play with your baby as you normally would" (Moreno et al., 2006), or "Play like you would normally do at home" . The interactions are typically coded over a period of time varying from 3 (Feldman, Singer, et al., 2010) to 6 min (Moreno et al., 2006).
Various approaches to quantifying touch events have been adopted, with some focusing on duration (Moreno et al., 2006) and others on number of instances of touch (Reece et al., 2016). Multiple coding schemes have been employed, with some focusing on low-level, descriptive touch properties such as "static," "tickle," or "pat" (e.g., Stack et al., 1996), and some targeting higher-level touch features, with coding categories like "affectionate touch", "stimulatory touch" or "proprioceptive touch" (Feldman, Gordon, et al., 2010;Feldman, Singer, et al., 2010). Sometimes, studies investigating general caregiving qualities include touching behaviors in their coding schemes, collapsed together with other behaviors in broader categories like "maternal engagement" (e.g., Krol et al., 2019). However, some authors have pointed out that coding schemes used in studies on maternal sensitivity and attachment largely omit or do not take an in-depth approach to observing touch (Botero et al., 2019). Even the approaches that aim to capture low-level properties of touch tend to merge touching behaviors that may have different functions and mechanisms. An example of this would be Stack et al. (1996) including stroking and caressing in the same category as rubbing and massaging, even though the former have been shown to have distinct neurobiological mechanisms, associated with a special type of fibers called CT afferents (McGlone et al., 2014). Only relatively recently have stroking and caressing started to be treated as a separate category in coding schemes (e.g., Stack et al., 2014, as cited in Mercuri, 2019. Moreover, while being a relatively objective measure, PCIs observed in a lab, or even in a home setting, are quite an artificial situation for caregivers to find themselves in, likely affecting their behaviors in non-negligible ways. The vast majority of PCI-based protocols focus on playful interactions, which may not be representative of a large proportion of everyday parent-infant contact.

| The present study
Very few studies have used more than one measure of caregiver touch, and the large diversity of methods employed in different studies makes it hard to interpret and generalize the findings. It is possible that the existing measures aimed at capturing equivalent touching behaviors actually tap into different aspects of caregiver touch. Existing measures also rely on the accuracy of parental self-report, or the representativeness of a short period of child-focused interaction. The aim of the present study was to examine, for the first time, whether different approaches to measuring caregiver touch, one-off questionnaires, diaries, and objective observations, are related, in order to establish the extent to which they measure similar, or possibly different aspects of caregiver touch.
One other innovation is in the way we measured touch in parent-child interaction. It is likely that a large proportion of touching behaviors (or lack thereof) between parent and infant occur in nonplayful situations, like preparing a meal, or having a conversation with another adult. Use of touch in these situations possibly differs both qualitatively and quantitatively from parental touching behaviors during playful, infant-focused situations, the classical setting in which touch is observed. Individual variation may be higher in these situations, with some parents preferring to keep closer contact with the child than others. Similarly, self-report measures may also capture behaviors in situations in which parents are focusing their attention on infants and therefore are more conscious of whether and how they use touch. This is why we included both a free-play session (PCI-FP) and a PCI-Q condition in the study protocol, when parent was having a conversation with the experimenter (answering questions from a questionnaire). We assumed this condition is likely representative of a large proportion of everyday interactions between parents and children, therefore potentially capturing important variation in caregiver behavior.
Thus, in this study touch was captured with an adapted version of PICTS, the Social Touch Questionnaire, a custom Touch Diary, and PCI-derived touch measures. We were interested in whether there are associations between putative equivalent measures from the questionnaire and diary approaches by looking at how they correlate with behaviors observed in the lab. In particular, we investigated the general structure of the data by observing whether measures map onto one or more common factors. One possibility is that we could observe a clear distinction between self-report and the play-focused observed measure, consistent with the former being subject to "faking good," or the latter not being representative of touch in real life. Another goal was to take a more in-depth look at the spectrum of touching behaviors we can observe in the lab, with a focus on comparison between free play and a nonplay/task-focused situation.
Touch behaviors decrease during the first 6 months of life ) and may decrease further as children become mobile and other means of interaction are more frequently employed. Few studies to date have investigated caregiver touch beyond the sixth month of infant life. We therefore included in the study a broad age range (6-to 13-month-olds) to enable us to investigate developmental dynamics of parental touching of children who are less reliant on being carried, therefore potentially making it easier to observe individual differences in parental behaviors. Measures that are more biased toward "faking good" are likely to capture less developmental changes in touch behavior.

| Participants
The study was conducted at the Pampers Baby Care Research & Development Centre (Schwalbach am Taunus, Germany). The participants consisted of an opportunity sample recruited from a pool of families living in the Taunus and Frankfurt am Main area, who expressed interest in research taking place at the facility. They were originally recruited into two age groups: 6-to 8-month-olds (n = 39, M = 7;21, 21 males and 18 females) and 11-to 13-month-olds (n = 32, M = 12;10, 17 males and 15 females). The data presented here were originally collected as a part of a larger study with clear agerelated hypotheses regarding relationships between measures of touch and infant arousal and cognitive development (see Procedures section for more detail). Because the current work is not based on these hypotheses, we therefore pooled the participants into one group of seventy-one infants aged 6 to 13 months in order to increase statistical power and used age as a continuous variable in analysis. The sample size compares rather favorably to those in previous studies employing video-coded measures of caregiver touch (e.g., Feldman, Gordon, et al., 2010;Feldman, Singer, et al., 2010: n = 53;Jean & Stack, 2009: n = 40). Sixty-nine of the primary caregivers identified as female, and the remaining two identified as male. Inclusion criteria for the study were as follows: infant gestational age at the time of birth >37 weeks, no diagnosed developmental disorders and German fluency (caregiver). The present study was conducted according to guidelines laid down in the Declaration of Helsinki, with written informed consent obtained from a parent or guardian for each child before any assessment or data collection. All procedures involving human subjects in this study were approved by the Research Ethics Committee at the Department of Psychological Sciences, Birkbeck, University of London.

| Parent-Infant Caregiving Touch Scale-adapted version
An adapted version of the Parent-Infant Caregiving Touch Scale (Koukounari et al., 2015) was used as a first self-report measure of caregiver touch. The questionnaire was translated into German, and in addition to the original items (see: 1.2.), two extra items were added: I sleep in the same bed with my baby and I carry my baby in a sling. We added the two additional items because they tap into an interesting dimension of proximity, and likely capture parental touch in nonplayful or infant-focused contexts.
The original version of PICTS has a three-factor structure, composed of Stroking, Affective Communication, and Holding. We treated the three factors as subscales (Ahmadzadeh et al., 2019) and included a fourth subscale (Proximity) comprising the two extra items. A score for each subscale was simply computed as a sum of scores for each item loading onto the respective factor. We decided to also compute a total score (PICTS Total), composed of all items in the questionnaire, in order to get a general measure of touching behaviors. The item I leave my baby to lie down loads positively onto the Affective Communication factor, but negatively onto the Holding factor (Koukounari et al., 2015). Thus, for both the Holding subscale and the total PICTS score these items were reverse-scored. For the total score and the subscale scores, the higher the scores, the more often the parent engages in touch-related aspects of caregiving.

| Social touch questionnaire
Our version of the STQ was translated into German, and three items were removed, as we deemed them either not applicable to our study participants (I'd feel uncomfortable if a professor touched me on the shoulder in public) or associated with romantic, intimate affection (I like being caressed in intimate situations and I feel disgusted when I see public displays of intimate affection). The adapted STQ version consisted of the remaining original seventeen items. Higher scores indicate more anxiety and embarrassment and less positive experiences with social touch.

| Touch diary
A second self-report measure of caregiving behaviors used in our study was a custom online Touch Diary, based on diaries previously used in other studies (Barr et al., 1988;Lam et al., 2010). In the diary, primary caregivers were asked to estimate the number of minutes they spent each hour over a period of 24 h holding (please note that the original German word used "kuscheln" is closer in meaning to "cuddling"), stroking, and talking to their infant, every day for seven consecutive days. To indicate the number of minutes, they used slider-like scales, with the value "0 min" as the minimum, the value "60 min" as the maximum, and a 1-minute resolution. The diary was hosted on the online platform SurveyMonkey, which formats the questionnaires in a smartphone-friendly way. Parents received separate emails with links to the diary for seven consecutive days and were encouraged to fill them out on their smartphones. The instructions emphasized that while they should aim for their answers to reflect their actual behaviors, it is understood that they can only be approximate in their estimations. They could open the diary for a given day multiple times, and were asked to fill it out whenever convenient.

| Parent-child interaction (PCI)
Interactions between parents and their children were filmed and later coded for parental touch patterns. Parent-child interaction (PCI) was observed in two situations: 10 min of free play (PCI-FP) and 10 minof parent answering questions (PCI-Q) from the Infant Behaviour Questionnaire-Very Short Version (IBQ-R; Putnam et al., 2014). The IBQ-R is a questionnaire designed to assess infant temperament, with questions revolving around infant behavior during the 7 days preceding the assessment (example items: When tired, how often did your baby show distress? and During a peekaboo game, how often did the baby laugh?). It is worth noting that while the PCI-Q condition was designed to capture parental behavior in noninfant-focused interactions, the topic of the conversation with the experimenter was still the child. This could have potentially primed the caregiver to pay more attention to their caregiving behaviors.
The moment when the experimenter left the room was considered the beginning of PCI-FP, while for PCI-Q the beginning was the moment when the experimenter begun to ask the questions. In PCI-Q, if the caregiver answered all the questions from the IBQ-R questionnaire before 10 min passed (which happened very rarely), the experimenter continued with small talk about the child. In the more common case in which not all the questions were answered during those 10 min, the experimenter stopped asking the questions once 10 min passed and the caregiver was asked to fill out the missing items at the end of the visit, when they were given the PICTS and the STQ questionnaires.
The PCI videos were later coded offline, using a custom coding scheme based on criteria we adapted and modified from Stack et al. (2014, as cited in Mercuri, 2019; these were as follows: stroke/ caress (CT-targeted touch); kiss/pat (light brief touch); hold/hug/cradle (constant pressure applied on large part of body; warmth); massage (deep pressure); touch with objects (incl. wiping mouth, fixing clothes; brief stroke); moving limbs/body (proprioceptive); tickle (unpredictable); games/routines played on body (predictable); static (constant pressure applied on small part of body); rocking (predictable and proprioceptive).
Our aim was to capture the full spectrum of possible tactile behaviors occurring during PCIs, while focusing on low-level touch properties (e.g., kissing, holding). We found that such properties are easier to identify and label than other, putative higher-level touch properties (e.g., affectionate touch, stimulatory touch) used by coding schemes in some studies (e.g., Feldman, Gordon, et al., 2010;Feldman, Singer, et al., 2010). Moreover, in the light of evidence that stroking/caressing is associated with distinct neurobiological mechanisms from other types of tactile stimulation (McGlone et al., 2014), it was important to us to code this touching behavior separately.
Videos were coded frame by frame using Datavyu software (Datavyu Team, 2014), widely used for coding infant data (e.g., Crespo-Llado et al., 2018;Della Longa et al., 2020) at 30 frames per second. For both conditions, PCI-FP and PCI-Q, 5 min of interaction were coded, starting with the third and ending with the seventh minute of the interaction in each condition. The categories were not mutually exclusive, meaning that multiple types of touch (e.g., "hold/hug/cradle" and "kiss") could occur at the same time. Total duration for every touch category was later calculated by adding up durations of each touch event. The total duration of overall touch, that is, any time the infant was being touched at all during the 5 min of interaction being coded, was also computed in both PCI conditions. Inter-rater reliabilities were calculated on 20% of interactions using a two-way mixed, consistency single-measures intraclass correlation (ICC; Hallgren, 2012;McGraw & Wong, 1996). The first author was the primary coder, whose data were used in the analyses. Although, naturally, she was not naïve to the hypotheses of the study, at the time of coding, the PCI-FP and PCI-Q videos were not linked to the questionnaire and diary scores. The secondary coder did not have access to these scores at all. For the total duration of touch, which was the only coding-based measure used in correlational and PCA analyses, the ICC was 0.92, indicating excellent reliability (Cicchetti, 1994). In case of the specific touch categories, the ICCs ranged from excellent (0.99 for hug/hold/cradle and rocking, 0.97 for games/routines played on body, 0.94 for stroke/caress, 0.88 for moving limbs/body) through good (0.62 for static) and fair (0.59 for touching with objects) to poor (0.35 for tickle and 0.01 for massage) (Cicchetti, 1994). Although the latter two categories of touch, tickling, and massage, need to be interpreted with caution, the remaining ICC values are in the acceptable range, and comparable with those in other studies using this approach (e.g., Reece et al., 2016).

| Procedure
The data presented here were collected as a part of a larger study investigating the relationships between caregiver touch and infant developmental outcomes. Other measures such as salivary cortisol and oxytocin, heart rate, and infant performance in table top and eye-tracking tasks measuring infant exploratory behavior and attention were taken. Although these other measures and tasks are not part of the current study, we describe the whole visit in order to provide a context for interpreting the touch measures reported on throughout the current manuscript.
Infants and their caregivers were brought into the laboratory and provided informed consent before the start of the study. The caregivers were made aware that their behavior during the entire duration of the visit will be filmed (unless they withdraw their consent), but were not told that we were specifically interested in touching behaviors until the end of the visit. Following a short time allowing participants to familiarize themselves with the setting, saliva samples were taken from the infant by the caregiver using Salivette ® Cotton Swabs (Sarstedt, Rommelsdorf, Germany), and a heart rate recording device (Heart Rate band Polar H7-a device on a strap) was put on the baby's chest. Next, the baby was presented with a two-minute-long animation during which heart rate measurement was taken, after which the experimenter turned on three video cameras and the parent was informed that from now on, everything happening in the room would be video-recorded until the experimenter said otherwise. Then, the parent was asked to change the baby's diaper and, when they were done, Parent-Child Interactions, Free Play (PCI-FP) and Questions (PCI-Q), began.
Both interactions took place in the same room, one after the other. In order to create an environment where potential caregiver touch would be maximized, the room was not equipped with any toys, only a blanket, a beanbag, and two cushions (see Figure 1).
For PCI-FP, parents were instructed to play with their children like they normally would at home, without any toys, and if possible, to remain close to the area marked out by the blanket, for the cameras to be able to capture the interaction. The experimenter was not present in the room, but observed the interaction through a one-way mirror in an adjacent room, a fact which parents were made aware of. After 10 min of free play, the experimenter returned to the main room, sat down on the blanket, and asked questions from the IBQ-R for another 10 min; this constituted the PCI-Q part of the procedure. Afterward, the baby was again presented with the same animation, saliva samples were collected, and the baby then participated in the table top and eye-tracking tasks. At the end of the visit, the parent filled in the Parent-Infant Caregiving Scale and Social Touch Questionnaire. They were also given instructions for the Touch Diary and were informed that completing all seven days of the diary would qualify them to participate in a draw to win a 50 euro Amazon voucher. The links to each day entry of the Touch Diary were sent to the parents every day for seven consecutive days, with the first one being sent on midnight the day following the visit in the lab, and the next ones following every 24 h.
The entire parent-infant dyad visit at the lab lasted on average between 1.5 and 2 h.

| Analytical approach
We start by characterizing the range of normal variation in the behaviors of interest, across measures, as well as their associations with infant age. We also compare PCI-Q and PCI-FP. We then go on to investigate to which extent measures of caregiver touch agree with each other. We focus on associations between putative equivalent measures (e.g., stroking in Touch Diary and the PICTS, holding in the Touch Diary and the PICTS, and Total touch in PICTS, PCI-FP and PCI-Q). We go on to perform a principal component analysis on all collected measures of caregiving behaviors. Finally, we qualitatively compare practical aspects of using a questionnaire, a diary, and parent-child interaction-derived measures.

| Touch diary characteristics
Forty-two caregivers (out of seventy-one) completed all seven days of the Touch Diary, a completion rate of 59%. An additional four parents completed six out of seven days, and their scores were also included in the analyses, resulting in a final completion rate of 65%. This completion rate is comparable to that in other studies employing this approach (e.g., Nicholl, 2010).
Descriptive statistics can be found in Table 1. While the amount of time spent talking is normally distributed in our sample, this is not the case for stroking and cuddling (see the Shapiro-Wilk tests in Table 1). We did not find significant associations between infant age and talking (r s = −0.01, n = 46, p = 0.98), stroking (r s = −0.15, n = 46, p = 0.31), or holding (r s = −0.25, n = 46, p = 0.10).

| Parent-infant caregiving touch scale characteristics
Sixty-eight parents provided PICTS scores, with data from three parents missing due to parents not completing the questionnaire (2 participants) or experimenter error (1 participant). The Cronbach's α value for the total score in our sample was 0.71, which can be considered appropriate (A. Field et al.,

| 11
BRZOZOWSKA et Al. 2012). The mean value of the overall PICTS score was 54 (N = 68, minimum = 39, maximum = 65, SD = 5). Table 1. shows descriptive statistics for the total PICTS score and subscales. Please note that the subscales Holding, Affective Communication, and Proximity are non-normally distributed.

| Social touch questionnaire characteristics
All but one participant provided STQ scores. The STQ score is normally distributed in our sample. The Cronbach's α value was 0.75, indicating appropriate reliability (Field et al., 2012). We did not find an association between the STQ score and infant age (r s = 0.17, n = 70, p = 0.17). More detailed descriptive statistics can be seen in Table 1.

| Parent-child interaction characteristics
Characteristics of the different categories of touch, as coded from the Parent Child Interaction-Free Play and Parent Child Interaction-Questionnaire are depicted in Table 2. Descriptive statistics on touching behaviors in a playful (PCI-FP) and functional (PCI-Q) context, and Wilcoxon signed-rank tests between median durations of touching behaviors in those two contexts are presented in Table 2. Histograms depicting duration distributions of some of the touch categories during PCI-FP and PCI-Q T A B L E 1 Descriptive statistics for touch diary, PICTS and its subscales, and STQ  can be found in Data S1. Our findings suggest that caregiver touch during a free play, infant-focused situation differs both quantitatively and qualitatively from caregiver touch in a situation where the caregiver's attention is not focused on the infant.
We found a significant negative correlation between PCI-FP Total Touch (r s = −0.40, n = 71, p = 0.001) and infant age. PCI-Q Total Touch showed a trend toward a negative correlation with infant age (r s = −0.28, n = 68, p = 0.02) which did not reach the Bonferroni-corrected significance level of 0.005. There was a trend toward a negative correlation between stroking and age during free play (r s = −0.33, n = 67, p = 0.006), but not during a nonplayful interaction (r s = 0.09, n = 69, p = 0.481).

| Associations between measures of equivalent behaviors
Pearson and (where the variables did not meet the normality criterium-see Table 1) Spearman correlations were calculated to investigate the consistency between measures supposed to capture equivalent behaviors, and relationships between self-reported and observed measures of caregiver touch: stroking in Touch Diary and the PICTS, holding in the Touch Diary and the PICTS, and Total touch in PICTS, PCI-FP, and PCI-Q. The full correlation table can be found in Table S5. The significance level was Bonferroni-corrected for multiple comparisons with the resulting threshold of p = 0.004.
We found that stroking reported in the Touch Diary was positively correlated with the Stroking factor of PICTS (r s = 0.45, n = 44, p =.002). No other relationships between variables supposed to reflect particular behaviors reached statistical significance. However, we found the total PICTS score to be correlated with total touch in PCI-FP (r s = 0.39, n = 68, p =.001).
These results indicate some consistency between the self-reported measures of parental stroking, and confirm the external validity of the PICTS scale for the first time, showing that its scores map onto caregiver behavior as observed in the lab.

| Dimensional data structure
Principal component analysis (PCA) was conducted as a part of an exploratory investigation into the overall dimensional structure of the data. We aimed to understand the dimensional structure underlying our collection of measures, specifically, whether we could observe a common underlying factor emerging from all our measures. Variables violating the "no significant outliers" assumption of PCA were excluded from the analyses, leaving Stroking PICTS, Holding PICTS, Affective Communication PICTS, Proximity PICTS, STQ, total duration of touch during PCI-FP, and PCI-Q, and diary measures of stroking and talking. In order to correct for missing data (which was quite a high proportion in the diary measures -35%), we used the MissMDA R package to perform multiple imputation with the iterative PCA method (Josse & Husson, 2016). This method of handling missing data has been found to be optimal for performing PCA (Dray & Josse, 2015). Data from both age groups were pooled in order to fulfill the sampling adequacy criterium of PCA (5 -10 cases per variable). The Kaiser-Meyer-Olkin measure verified the sampling adequacy for the analysis KMO =.67, which is above the acceptable limit of.5 (Field, Miles, & Field, 2012;p. 770). Bartlett's test of sphericity, χ 2 (36) = 176.523, p < 0.001 indicated that correlations between items were sufficiently large for PCA. Analysis of the scree plot suggested that two components should be retained. In combination, these two components explained 53.41% of the variance. Figure 2 shows a visualization of unrotated PCA results with added uncertainties generated by the multiple imputation, and factor loadings can be found in Table 3.
The items that load on the two dimensions suggest that Dimension 1 is more associated with selfreported measures of caregiving behaviors, while Dimension 2 represents directly observed caregiver touch. This is compatible with the idea that Dimension 1 is more associated with what has been called "faking good" (Field, 2019), while Dimension 2 is a more accurate account of caregiver behavior patterns. PCI-FP loads positively onto both dimensions, which would indicate that while it is a direct observation, there is also an element of the caregiver potentially being hyper-conscious of their behavior, wanting to perform. However, it may also be the case that when reporting their touching behaviors, caregivers are better at recalling those interactions in which they were focused on the infant, which is why self-report measures may be more biased toward touch measured during play rather than capturing touch throughout daily activities.
One interesting feature of the results is the negative loadings of the Diary measures of talking and stroking onto Dimension 2, which speaks in favor of the playful vs. nonplayful interpretation of the two dimensions. In this interpretation, Dimension 1 reflects caregiver behavior in free-play, infantfocused interactions, while Dimension 2 is associated with touch in everyday situations in moments when caregiver attention is not focused on the baby. In such situations, talking to and stroking the baby would not necessarily occur. However, given the amount of missing data and the uncertainties associated with the Touch Diary measures (see Figure 2), one has to be cautious when drawing conclusions based on these measures. STQ, a measure of anxiety and discomfort associated with social touch, loads negatively onto both dimensions, albeit the loading onto Dimension 1 is larger. This indicates that parental attitudes and F I G U R E 2 Variable representation in the PCA two-dimensional space (unrotated), with visualized uncertainties associated with imputation of missing values. The ellipses and clouds of dots represent overlapped outcomes of the PCA after 1000 imputation simulationswe affects toward social touch are, as predicted, associated with caregiving behaviors, but more so in case of self-reported touch and touch occurring in playful/infant-focused interactions.

| DISCUSSION
Researchers wanting to investigate the relationship between touch and infant development face a difficult challenge of choosing the right measure(s) to capture the dimensions of touch they are interested in. By employing three different measures of caregiver touch in one study, one-off questionnaires, a diary, and objective measures captured during parent-child interaction, we were able to not only describe the natural variation in various aspects of caregiving behaviors, but also show how those measures relate to each other.
We observed significant variation across behaviors of interest, with a number of variable distributions being normal. This observation was particularly informative with regard to the PICTS questionnaire measure (Koukounari et al., 2015), as it could have been subject to a ceiling effect, with the possibility of parents consistently reporting high levels of touching behaviors in efforts to come across as good caregivers. We did, however, find the total score and the Stroking subscale score to be normally distributed in our sample.
An important feature of our study was observing touch both during play, as most previous studies have, and in a situation in which parents may not be particularly focused on or aware of their typical caregiving behaviors. The latter condition differed from the former in two important ways: the interaction was not infant-focused, and there was ambiguity as to whether or not parents' behavior was being measured. As predicted, we observed quantitative and qualitative differences in parental touching behaviors between these contexts. Parents generally touched their children less when talking to the experimenter than during free play. It is important to note though that, although total duration of touch was smaller in PCI-Q, the spread was larger, suggesting that this measure captures more variance. The nature of touch behaviors varied, with more holding during PCI-Q but more playful touch (tickling, kissing) during PCI-FP. Interestingly, no differences were found in the time spent stroking the child. In general, we found that parents used relatively little stroking during PCI. This finding is consistent with what was reported in other studies using observed measures of touch Mantis et al., 2019). Despite the documented benefits of this type of tactile stimulation (Pickles et al., 2017;Van Puyvelde et al., 2019), and the enhanced focus on investigating its mechanisms in early development (Gliga et al., 2019), stroking may occur relatively rarely, or mostly in specific contexts (e.g., feeding or rest). Thus, stroking may be better captured by parental self-reports, which reflect on touch across daily activities. We indeed found a good degree of agreement between the PICTS stroking subscale and stroking reported in the diary. With regard to infant age, we observed that the older the babies were, the shorter were the observed total durations of caregiver touch during parent-child interactions.  found a similar effect in a longitudinal study with infants aged 1, 3, and 5 and a half months. This observation comes as little surprise, considering how a lot of caregiver-infant physical contact serves the purpose of moving or securing the position of an infant whose motor skills do not yet allow them to do so themselves (Little et al., 2019), and gross motor skills develop rapidly around the time infants turn one year old (Adolph & Robinson, 2015). What is more interesting, is that the self-report measures of caregiver touch in our study did not show such associations with infant age. While this finding could indicate that self-reported touch is biased toward "faking good," the fact that questionnaire scores were correlated with parental behavior observed in the lab speaks against this interpretation. It is more likely that the self-reported estimates of touch provide information on beliefs which are fairly stable across infant development. This is a first indication that self-report measures also capture individual differences in parental tendencies and attitudes associated with their caregiving practices.
When comparing touch estimates across measures, we found the total PICTS scores to be moderately correlated with the duration of touch in parent-infant interaction, demonstrating that the PICTS scores map onto real-life caregiver behavior. Considering that the PICTS is a relatively short, uncomplicated questionnaire filled out by the parent at a single time point, this finding further confirms the usefulness of this psychometric tool. Our analysis of the dimensional structure of our data also showed that touch during free play was more positively related to parental self-reported touch than touch in the functional context, with the latter forming an independent dimension. All this evidence combined suggests that the self-report and free-play-based measures may not capture the entire spectrum of caregiver touching behaviors. In particular, our findings suggest that the PICTS is biased toward reporting on touch during periods of time in which the parent is focused on interacting with the child.
An ideal measure would describe parental touch across a variety of contexts, yielding full-scale estimates similar to those in animal studies. Diaries, in theory, have the potential of fulfilling this criterium, considering the time span they cover and their straightforward descriptiveness. However, in our study we found little added value of the diary measure. The dimensional structure of our data revealed that the diary-based estimates were closer to the questionnaire-based estimates than to the touch observed during parent-child interaction. Moreover, our diary measure was associated with a large proportion of missing data. Even though we tried to make it easy to use, with a slider-scale and smartphone-friendly design, filling it out daily was likely still a cumbersome task for parents of infants.
One of the objectives of our study was to compare the practical aspects of existing caregiver touch measures. Table 4 provides a brief overview of our insights into the psychometric aspects as well as the time costs for both the parent and the researcher, and the amount of missing data associated with each measure. This overview is largely based on our subjective observations, and it is possible that in other samples or with slightly modified methods these features would look different-our aim was to draw attention to the advantages and disadvantages we experienced.
In conclusion, we find moderate to low agreement between measures of caregiver touch, in infancy. A brief questionnaire, the PICTS, seems to capture touch during particular daily activities, when caregiver's attention is directed to the child, but may provide a more veridical estimate of particular types of touch, such as stroking. Given the key role given to this type of touch in developmental literature, this may explain why the PICTS associates with various developmental variables (e.g., Pickles et al., 2017). For a broader depiction of caregiver touching behaviors, researchers ideally should record parent-child interaction in a variety of contexts. This may be true for capturing other types of interaction, not only touch. Just as is now possible to record verbal interaction continuously during the day, to validate lab-based or questionnaire measures (Canault et al., 2016); in the future, smart suits (Zhu et al., 2015) may automatically register physical contact. Efforts to automate the video coding process (e.g., Chen et al., 2016) could decrease the workload on the researchers and make it feasible to extend the period of time during which touch is directly observed and characterized.