Preschool Children Rarely Seek Empirical Data That Could Help Them Complete a Task When Observation and Testimony Conflict

Children (N = 278, 34–71 months, 54% girls) were told which of two figurines turned on a music box and also observed empirical evidence either confirming or conflicting with that testimony. Children were then asked to sort novel figurines according to whether they could make the music box work or not. To see whether children would explore which figurine turned on the music box, especially when the observed and testimonial evidence conflicted, children were given access to the music box during their sorting. However, children rarely explored. Indeed, they struggled to disregard the misleading testimony both when sorting the figurines and when asked about a future attempt. In contrast, children who explored the effectiveness of the figurines dismissed the misleading testimony.

Children learn about the world in a variety of ways. They can learn by paying attention to what other people do (Hoehl et al., 2019). They can learn from testimony directed toward themselves or toward other people (for reviews : Harris, Koenig, Corriveau, & Jaswal, 2018;Mills, 2013;Sobel & Kushnir, 2013;Tong, Wang, & Danovitch, 2020). And, they can gather evidence through exploration (Bonawitz et al., 2011;Schulz & Bonawitz, 2007;Yu, Landrum, Bonawitz, & Shafto, 2018), experimentation (Cook, Goodman, & Schulz, 2011;K€ oksal-Tuncer & Sodian, 2018), and questionasking (Callanan & Oakes, 1992;Kurkul & Corriveau, 2017). Children's ability to learn from these diverse sources of information is one reason they are able to learn so much so quickly. Each of these sources of information can provide children with unique insights about the world. For example, by listening to other people, children can avoid costly mistakes and learn about unobservable scientific and religious phenomena they could not discover on their own (Harris & Koenig, 2006). By tracking statistical regularities young children can quickly build up and revise their understanding of causal structures without relying on other people's testimony (Bridgers, Buchsbaum, Seiver, Griffiths, & Gopnik, 2016).
In addition to providing distinctive insights about the world, the testimony children receive and their firsthand experiences can also propel or challenge children's learning, depending on whether these two sources provide consistent or inconsistent data about the same phenomena. When testimony and firsthand experience provides consistent data, children's learning is strengthened because they have multiple sources confirming a given piece of information. However, when firsthand exploration and testimony conflict, children have to decide how to integrate these two sources of information, that is, whether either source should be considered more true or reliable than the other, whether both sources could possibly be true, or whether additional information is needed to resolve the conflict. Prior research has demonstrated that by 4 years of age, children can resolve conflicts between different sources of information (observation vs. testimony) based on the relative merits of each information source, for example, the strength of the observed evidence (probabilistic vs. deterministic) and the prior accuracy of an informant (Bridgers et al., 2016). This ability to appropriately weigh conflicting data allows children to quickly resolve the tension between conflicting sources when there is a clear discrepancy in their reliability. However, how do children react when different information sources are equally compelling and resolving that tension would help them complete a task? Do they gather additional information or attempt the task without it? Below, we review prior research pertinent to this central question.

Background
Whether children are learning from what other people tell them or by tracking statistical regularities through observation, children are sensitive to the strength of the evidence that those two sources of information provide. When learning from other people, children assess informants based on a number of different cues and will reject testimony when they have reason to believe that an informant's information is unreliable (for reviews: Harris et al., 2018;Mills, 2013;Sobel & Kushnir, 2013;Tong et al., 2020). For example, 4-year-old children keep track of whether an informant provided them with correct or incorrect information and adjust their trust in that informant as they interact with them and gain new information about their accuracy (Ronfard & Lane, 2018. Children also place more weight on observed data when this is generated by a knowledgeable rather than a na€ ıve adult (Bonawitz et al., 2011;Butler & Markman, 2012;Kushnir, Wellman, & Gelman, 2008). Furthermore, when making inferences based on observed statistical patterns, 4-year-old children distinguish between deterministic and probabilistic patterns (Bridgers et al., 2016).
These prior studies demonstrate that children are able to appropriately weigh testimonial and observational evidence when these two sources of information are presented individually. However, how do children respond when both sources are available and conflict with each other? For example, how do they respond when what they are told about how a toy works conflicts with what they see? To find out, Bridgers et al. (2016) had an informant introduce 4-and 5-year-old children to a novel toy. This informant was introduced as either na€ ıve or knowledgeable. Following the informant's testimony about which block made the machine go, children observed data conflicting with the informant's testimony. These data were either deterministic (the endorsed block activated the machine 0/6 times, whereas the unendorsed block activated it 6/6 times) or probabilistic (the endorsed block activated the machine 2/6 times, whereas the unendorsed block activated it 4/6 times). When subsequently asked which block made the machine go, children appropriately discarded the testimony from both the na€ ıve and knowledgeable experimenter when they observed data clearly contradicting their testimony (i.e., the deterministic data). However, when they observed less conclusive data (i.e., the probabilistic data), their inferences differed based on the reliability of the experimenter. In such cases, children relied on what they saw when taught by the na€ ıve experimenter, but did not show a preference for what they saw when taught by the knowledgeable experimenter.
In real-world situations, it is rare that children observe six identical and consecutive observations either confirming or conflicting with a claim. Rather, children might be told one thing, and then observe an event providing conflicting information. In such situations, the two sources are likely to be equally compelling, and the conflict between them can only be resolved by obtaining additional information. Thus, this study builds on prior results by asking how children respond when what they are told and what they observe conflict rather than converge and they have no reason to doubt either source of information. More specifically, we ask if children spontaneously seek out additional empirical information to resolve the conflict when given the opportunity to do so. To motivate children to seek out such information, we asked them to engage in a sorting task in which accurate sorting would benefit from the seeking of further information.
Prior research has shown that preschoolers engage in the exploratory investigation when causal information is confounded, for example, when it remains unclear whether either or both levers activate a toy (Schulz & Bonawitz, 2007). Accordingly, we might expect that preschoolers will also engage in the exploratory investigation when facing conflicting information (e.g., when there is evidence indicating that a given figurine does not work, yet they have been told otherwise). However, as Schulz and Bonawitz (2007) note, their study used an implicit measure of children's sensitivity to confounded observational evidence, meaning that the extent to which children themselves were aware of their reasons for further exploring the toy remains uncertain. More explicit measures of children's understanding of confounding have shown that while that understanding is developing during the preschool years (Cook et al., 2011;K€ oksal-Tuncer & Sodian, 2018), it is not until the elementary school years that children develop a more explicit understanding of the relationship between claims and evidence (Astington, Pelletier, & Homer, 2002) and the ability to explicitly test claims in a manner that can isolate confounded causal factors (Chen & Klahr, 1999). Thus, while preschool children may be sensitive to the presence of epistemic uncertainty and engage in increased exploration in light of conflict, their insights as to when and how to resolve uncertainty in the pursuit of an explicit goal may still be developing.
Indeed, when it comes to actively seeking out empirical information with the goal of resolving a conflict between children's intuitions and what they are told, a recent set of studies suggests a crossculturally robust age change in children's exploration following a surprising claim (Ronfard, Chen, & Harris, 2018Ronfard, € Unl€ utabak, Bazhydai, Nicolopoulou, & Harris, 2020).  presented preschool and elementary school children with a set of different-sized Russian dolls and asked them which was the heaviest. All children indicated the biggest one. The experimenter then either confirmed children's intuition, "Yes, the biggest doll is the heaviest," or contradicted it, "Actually, that one is not the heaviest one. The smallest one is the heaviest one." Across age and condition, children subsequently endorsed this claim by the experimenter, even when it was counterintuitive. The experimenter then left the room, thereby allowing children to assess the experimenter's claim empirically by picking up the dolls to compare their weights. Elementary school children significantly increased their exploration of the dolls when their intuitions had been contradicted as compared to when they had been confirmed, frequently picking up the smallest and the biggest doll concurrently to compare their relative weight-a direct test of the claim they had been given. Preschool children rarely engaged in this behavior, whether their intuitions had been confirmed or contradicted. One interpretation of these results is that seeking out information in order to confirm or disconfirm a surprising claim (as opposed to engaging in exploratory play following a surprising claim) is a later developing ability because it requires the ability to reason about how a claim could be tested and by implication the realization that some claims are empirically grounded and can be falsified.
Although this interpretation is plausible, it is premature in at least one important regard. These "doll studies" relied on a single experimental paradigm that pitted children's prior intuitions about the relation between size and weight against a claim contradicting those prior intuitions. As Lane (2018) notes, children's willingness to accept counterintuitive claims depends on the strengths of their initial intuitions as well as on their acquisition of certain conceptual insights, notably, the distinction between appearance and reality (Lane, Harris, Gelman, & Wellman, 2014). Thus, age-related improvements in the strength of children's intuitions or background knowledge about the association between size and weight (Smith, Carey, & Wiser, 1985) could account for the parallel changes in children's empirical investigation of the surprising claim that the smallest doll is the heaviest. This study makes it possible to control for children's prior intuitions by presenting them with novel stimuli with which, irrespective of age, they all lacked prior experience. If preschool children do not spontaneously seek out further evidence when faced with a task that should motivate them to resolve a conflict between what they observed and were told, it would provide support for the claim that young children do not spontaneously think of deliberately investigating what they are told. Alternatively, if preschool children do seek out further evidence, it would suggest that young children seek out empirical evidence following surprising claims provided they conceptualize the claim as surprising.
In addition to examining whether children seek empirical evidence to resolve conflicts between what they have just observed and what they have been told, we asked whether order affects children's processing and weighing of conflicting information: Are children more likely to investigate or endorse a claim when they hear it before versus after witnessing an event contradicting that claim. Studies directly testing the effect of order on children's ability to weigh information from different sources are lacking because prior studies have either provided young children with testimony followed by counter (firsthand) evidence or with firsthand evidence followed by countertestimony rather than comparing both. When testimony is followed by firsthand counterevidence, children as young as 3 years of age are able to reassess their initial trust in an informant based on the conflicting evidence they have observed (Bridgers et al., 2016;Hermansen, Ronfard, Harris, Pons, & Zambrana, 2021;Scofield & Behrend, 2008). When firsthand evidence is followed by countertestimony, 4-and 5-year-olds but not 3-year-olds rely on what they saw and not what they were told (Jaswal, Croft, Setia, & Cole, 2010;Ma & Ganea, 2010). This pattern of data suggests that order may matter, at least for the youngest children, but the lack of a direct comparison of order across these studies makes it difficult to draw strong conclusions. Given the salience of more recent information (Berry, Waterman, Baddeley, Hitch, & Allen, 2018), we hypothesized that children would be more likely to seek out information relevant to an informant's claim when counterevidence is acquired after, rather than before, the presentation of that claim.
To evaluate these hypotheses, we created a task in which children received two pieces of information about which of two figurines to use in order to make a music box work. Children were directly told by an apparently knowledgeable experimenter whether the black or the white figurine would make the music box work. They also saw a different and apparently naive experimenter place the two figurines (one after another) on the box to provide empirical evidence about which of the two figurines made the box work. After receiving these two types of information, all children were asked to sort four new figurines (2 white and 2 black figurines) into containers based on whether or not they thought the figurines could make the music box work. We made the music box available but children were not asked or encouraged to test the objects. By comparing across conditions, we could see whether, while completing the sorting task, children were more likely to place figurines on the music box when the testimony they had heard conflicted with, rather than confirmed, what they observed. We distinguished between two types of exploration: Whether children placed only one type of figurine (black or white) on the music box and whether children placed both types of figurines (black and white) on the music box. Note that only by placing both types of figurines on the music box could children obtain information that would fully resolve the conflict between what they had seen and what they had been told. Regardless of whether children placed the figurines on the music box before sorting them, we also coded whether children sorted them according to what they had seen (the pattern of activation demonstrated by the na€ ıve experimenter who placed the figurines on the music box) and whether this sorting differed when the testimony they had heard conflicted rather than confirmed what they had seen. This design also allowed us to assess how far those who had placed both types of figurines on the music box and therefore had the necessary firsthand evidence to evaluate which figurines made the music box work was guided by this new information when sorting the figurines. Finally, after the sorting task, children were asked to make a prediction, "If I want to make the music box play one more time, which figurine should I use?" This additional question allowed us to examine any differences between how children weighed what they saw as compared to what they were told, using a verbal measure (their prediction) in addition to the two nonverbal measures (i.e., their sorting and exploration of the figurines).
To investigate the effect of information consistency, half the children received consistent information from the two sources-they were told by the apparently knowledgeable informant that only the white figurine could work, and they also observed that it was, in fact, the only one that worked when the na€ ıve experimenter placed the white and black figurine on the music box. For the other half of children, the information was inconsistent-they were told by the apparently knowledgeable informant that only the white figurine worked, but observed during the placements by the na€ ıve experimenter that it was, in fact, the black figurine and not the white figurine that worked.
The order in which children heard the apparently knowledgeable informant's claim or observed the na€ ıve informant's placement of the figurines was counterbalanced between participants. This permitted us to examine whether the consistency and order of verbal as compared to visual information influenced children's decision to seek further information (as reflected in their exploration of the figurines), their ongoing trust in the verbal claim (as reflected in their sorting patterns) and their reasoning about a future event (as reflected in their predictions).
Previous work indicates that although children are generally more certain about their own knowledge when it is gained from direct evidence rather than an informant's testimony, this certainty may diminish over time (Robinson, Haigh, & Nurmsoo, 2008). To ensure that children felt equally confident in what they observed and what they were told, we took advantage of the fact that young children use information about the knowledge of informants to assess their informativeness and their actions (Bonawitz et al., 2011;Butler & Markman, 2012;Kushnir et al., 2008). By making the informant who demonstrated the toy na€ ıve, we decreased the strength of that observational evidence. By making the informant who told children about the toy knowledgeable, we strengthened the testimonial evidence she provided. Given that children may value observational evidence more strongly than testimony (Robinson et al., 2008), we reasoned that this manipulation would "even the scale" and lead children to judge that the informativeness of the observed evidence as compared to the testimonial evidence was as equal as possible. As a result, we expected that, relative to children who received consistent information, children who received inconsistent information would be more likely to explore the figurines, less likely to sort the figurines based on what they had seen, and less likely to choose the figurine they had seen to be effective when making a prediction. Thus, we expected children's exploration, sorting, and prediction responses to be influenced by the mismatch between visual and verbal information. Moreover, we expected these effects to reflect a recency bias (Berry et al., 2018, but see Ronfard & Lane, 2018). For children who received inconsistent information, we expected a greater influence of testimony on children's exploration, sorting, and prediction responses when testimony was the more recent piece of information. For children who received consistent information, we expected no effect of the order of information on their performance.

Participants and attrition
The final sample consisted of 278 preschool-aged children, recruited from child-care centers across and outside of Oslo, Norway. An additional 12 children were tested but were excluded from the final analyses due to the following: (a) Child withdrawal (N = 2), (b) Technical error (N = 7), and (c) Inappropriate responses throughout testing, indicating a lack of understanding (N = 3).
Informed consent was obtained from the child's parents in advance of testing. In addition, children were asked prior to testing whether they would like to take part. Upon agreeing to participate, children were randomly assigned to one of four conditions (see Table 1), and tested individually in a quiet room of the child-care center by a group of research assistants. The study was approved by the local authorities on data protection (Norwegian center for research data [NSD], case no. 742,454).

Experimental design
Children were given an opportunity to observe whether a white figurine or a black figurine activated a music box when placed on top of it. In addition, children were also told by an apparently knowledgeable adult informant whether the white or the black figurine activated the music box. The color of the figurine that the informant claimed to work was counterbalanced across conditions. The experiment consisted of four phases, played out in a semifixed order, following a predefined script (see Figure 1 for an illustration; Appendix S2 for details of the script). For half the children, what they observed and what they were told proved consistent, whereas for the remaining half, the two types of information proved inconsistent. The order of the two types of information about the figurines was systematically varied across children.
In order to maximize the conflict between the informant's claim and the subsequent direct observation, the informant presenting the testimony presented herself as being knowledgeable by saying: "Oh! I can see that you have found the music toy, I know this toy very well. To make it play music you have to put white figurines on it. Only white figurines make it work." In contrast, the informant who generated the visible evidence, presented herself as being na€ ıve to the task by saying: "Look! Someone gave me this toy. I don't know how it works, but it looks like you can put these pieces on it here. I wonder what will happen if I do that!" The na€ ıve informant then tentatively placed the figurines on the music box one at a time, first white, and then black, one of which made the music box play music. The color of the functioning figurine was counterbalanced across conditions, meaning that overall, half the children were told that the white figurine could play and the other half were told that the black figurine could play.
After receiving the two types of information, the na€ ıve informant gathered the two first figurines and put them away, before telling the children that she wanted to go and get some new figurines. The na€ ıve informant then presented all children with a set of four new figurines (2 white and 2 black figurines), and asked children to sort them into two containers-one for figurines that were effective in activating the music box and one for figurines that were ineffective. Given that a particular aim of this study was to assess children's spontaneous and emerging tendency to engage in targeted exploration for the purpose of solving a task, children were given permission to solve the task as they saw fit by saying: "You can do it, however, you want. When you're done tell me and I'll come back. I just have some stuff to finish," and then left the music box next to them, allowing them (but not prompting them) to test the figurines before sorting. After providing these instructions, and to avoid any social pressure on children's exploration, the na€ ıve informant sat in a corner of the room facing away from the child until the child claimed to be finished, or until 2 min had passed. After sorting the figurines, the na€ ıve informant returned to the table, and asked the child: "If I want to make the music box play one more time, which figurine should I use?"

Data processing
For the planned analyses, we coded whether children placed one or both types of figurines on the music box, before allocating them to one of the two storage containers. With respect to the final question, we coded children's predictions about which figurine was likely to work in a future attempt. The experimental session was coded by two research assistants blind to the hypotheses of the study. Overall reliability was estimated at 96%. Following the reliability assessment, discrepancies due to coding errors were corrected, and discrepancies due to coder disagreement were resolved through discussion.

Statistical analyses
In the following analyses, we examine (a) children's exploration of which figurines activated the music box when carrying out the sorting task; (b) their weighing of information provided by the two informants, as reflected in their eventual sorting; (c) in the testimony 6 ¼ observation condition, the  For the other half of the children, the order was reversed (Panel B). In addition, for half of the children what they observed and what they were told was consistent while for the other half it conflicted. The child was then asked to sort a set of four new figurines into two containers-one for functioning figurines, and one for non-functioning figurines, and left alone until they were done, or maximally 2 min. This sorting task gave the child an opportunity to explore which figurines activated the music box, if they wanted to. After sorting the figurines, or after 2 min had passed, the child was asked which figurine should be used to make the music box work.

Children's exploration of the figurines
After having received the two pieces of information, one verbal and one empirical, from the two experimenters, children were asked to sort the figurines into two containers-one for figurines that were effective in activating the music box and one for figurines that were ineffective. During the sorting task, the music box was left available to the children so as to allow them to figure out which figurines were or were not effective in making it work. Importantly, children were not asked or encouraged to place the figurines on the music box. Thus, we could assess the extent to which children spontaneously took the opportunity to explore the figurines either to confirm the consistent pattern of evidence they had received (testimony = observation) or to resolve the conflict in the pattern of evidence they had received (testimony 6 ¼ observation). Spontaneous exploration would aid the accurate performance of the Sorting task, particularly for children who received inconsistent information. In what follows, we describe children's exploration of the figurines.
As a preliminary descriptive analysis of exploration, we examined how many children placed one type of novel figurines on the music box (only the black ones or only the white ones, N = 20, 7.2%), both types of figurines (N = 65, 23.4%), or neither (N = 193, 69.4%). Inspection of Figure 2 reveals that across the four combinations of consistency and order, the majority of children did not place any figurines on the box. To examine the likelihood of children placing any of the figurines on the box, we combined the children who explored one type of figurine and children who explored both types of figurines. The resulting binary coding (i.e., placed one or both types of figurines vs. did not place any figurines on the music box) revealed no significant difference in the likelihood that children placed at least one type of figurine on the music box based on whether they had received consistent rather than inconsistent data about which figurines made the music box work: Consistent, 26.3% (35 out of 133) versus inconsistent, 34.4% (50 out of 145), v 2 (1) = 2.18, p = .14.
To further examine the likelihood of placing at least one figurine on the music box, we regressed whether children engaged in such behavior using a logistic regression model, including the factors Consistency, Order, and Age. Confirming the prior analysis, this model was not significant,  v 2 (4) = 6.68, p = .15, R 2 = .034. Additional analyses described in Supporting Information revealed no interactions between Consistency, Order, and Age. Further analyses found no significant differences across the two conditions in children's more exhaustive exploration of the figurines, namely whether they placed both types of figurines on the music box as opposed to only one type or none. In sum, less than a third of children placed at least one type of figurine on the music box. Whether children did so did not vary as a function of whether the testimony children received from the knowledgeable experimenter was consistent or inconsistent with the evidence provided by the na€ ıve experimenter. Next, we examine how children went on to sort the figurines into the two containers. We first examine children's sorting on an overall level, including all children. We then focus our analysis on children who received inconsistent information (testimony 6 ¼ observation) and assess whether their sorting differed as a function of whether they had obtained the evidence required to determine which figurines worked, that is, whether they had placed both types of figurines on the music box.

Children's sorting of the figurines
Children were asked to sort the four novel figurines according to whether or not they could make the music box play. This sorting task enabled us to assess the extent to which children's inferences about the functioning of the figurines were affected by the presence versus absence of conflict between the information provided by the two experimenters.
To analyze children's sorting of the figurines, we coded children's sorting behavior into three categories. In the first category, we grouped children who sorted one or more of the figurines in a manner that was consistent with what they observed and none in opposition to it. That is, in this first category, we included children who sorted figurines in the container for effective figurines that were of the same color as the figurine they had seen make the music box work when observing the na€ ıve experimenter and who placed figurines in the container for ineffective figurines that were of the same color as the figurine they had seen fail to make the music box work when observing the na€ ıve experimenter. Note that in the testimony 6 ¼ observation condition, this meant that children did not consistently sort in accordance with what they were told.
In the second category, we included children who sorted one or more of the figurines in a manner that was inconsistent with what they observed, which in the testimony 6 ¼ observation condition meant that they sorted in accordance with what they were told. For inclusion in this group, children had to place either one or both of the two figurines in the container for ineffective figurines that were  of the same color as the figurines they had seen make the music box work when observing the na€ ıve experimenter, and/or place either one or both of the two figurines in the container for effective figurines they had seen fail to make the music box work when observing the na€ ıve experimenter. Finally, in a third category, we included children who failed to sort the figurines, never placing any of the figurines into either container.
As Figure 3 indicates, and an overall chi-square analysis confirms, the consistency of the information significantly affected children's sorting, v 2 (2) = 18.71, p < .001. Compared to children who received consistent information, chi-square tests revealed that children who received testimony conflicting with what they observed were less likely to sort the figurines in a manner consistent with what they observed (74 vs. 97, v 2 (1) = 14.05, p < .001), and more likely to sort one or more figurines in a manner that was inconsistent with what they observed (i.e., in a manner consistent with what they were told; 34 vs. 9, v 2 (1) = 14.76, p < .001). Children who received testimony that conflicted with what they observed were also somewhat more likely to fail to sort the figurines but not significantly (37 vs. 27, v 2 (1) = 1.07, p = .302). In sum, receiving testimony that conflicted with observation significantly impacted children's sorting of the figurines by increasing the tendency to sort according to what they told and reducing the tendency to sort according to what they observed.

Children's exploration and children's sorting in a manner consistent with the observation
In the analyses that follow, we analyzed children's sorting when it was consistent with what the na€ ıve experimenter had shown. More specifically, we examine whether, in the inconsistent information condition, children who gathered enough evidence to assess which figurines worked by placing both types of figurines on the music box were more likely to sort in a manner consistent with their prior observations with the na€ ıve informant (and inconsistent with what the knowledgeable experimenter had told them)-as compared to children who did not have enough evidence, that is, children who had placed only one type of figurine on the music box or who did not place any figurine on the music box.
Using logistic regression, we regressed whether children sorted in a manner consistent with their prior observations with the na€ ıve informant on Order, Age, and Exploration and their interactions (see Table 2). This analysis indicated a three-way interaction between Order, Age, and Exploration as a set, v 2 (2) = 6.55, p = .038. However, this threeway interaction was not evident following post hoc Bonferroni-corrected tests of the simple effect of exploration. Interactions between Order and Age, Order and Exploration, or Age and Exploration were not statistically significant (see Supporting Information for details), leaving only the main effects of Exploration and Age (see Figure 4). The main effect of Exploration confirmed that selfgathered empirical evidence during the sorting task altered children's sorting behavior across agesbringing it into line with the evidence they had obtained by watching the na€ ıve experimenter interact with the music box, v 2 (1) = 14.35, p < .001, as opposed to what they had been told about the music box by the knowledgeable experimenter. The main effect of Age confirmed that older children were more likely than younger children to sort in line with the evidence they obtained by watching the na€ ıve experimenter interact with the music box, v 2 (2) = 19.00, p < .001.

Children's exploration and children's sorting in a manner inconsistent with observation
In the analyses that follow, we analyzed children's sorting when it went against what the na€ ıve experimenter had shown them. More specifically, we examined whether in the inconsistent information condition children who gathered enough evidence to assess which figurines worked by placing both types of figurines on the music box were less likely to sort in a manner inconsistent with what they had observed (albeit consistent with what the knowledgeable experimenter had said) than children who did not have such information, that is, children who placed only one type of figurine on the music box or who did not place any figurine on the music box.
Using logistic regression, we regressed whether children sorted in a manner inconsistent with what they observed on Order, Age, and Exploration and their interactions. As displayed in Figure 5, this revealed a main effect of Exploration, v 2 (1) = À1.67, p = .009. Exploration had a significant impact on children's sorting, with children being less likely to sort in a manner inconsistent with what they had observed. There was no effect of Order or Age (see Table 2), and no significant interactions (see Supporting Information for details).
In sum, in sorting the novel figurines, children more often sorted in ways that were inconsistent with the perceptual evidence they had obtained by watching the na€ ıve experimenter if they had received testimony that was inconsistent rather than consistent with that perceptual evidence. Nevertheless, if children spontaneously gathered firsthand evidence of which figurines made the toy work by placing both figurines on the music box, they rarely displayed such deference to testimony. Thus, they were likely to sort in accord with, rather than contrary to, the perceptual evidence provided by the na€ ıve experimenter. This was true irrespective of age.

Children's explicit reasoning about which figurine would work
Although the sorting task provided us with a nonverbal measure of the extent to which children were affected by the conflict between the two sources of information they had previously  received, an alternative way of measuring children's weighing of information from two conflicting sources is to simply request a verbal responsequerying children in a way that makes them reflect on the prior information with respect to a future event. Thus, as a second measure of children's weighing decisions, we looked at their answer to the question: "Which figurine should I use if I want to make the music box play one more time?" In response to this question, the majority of children correctly pointed to the functioning figurine (N = 194, 69.8%), although a minority incorrectly pointed to the nonfunctioning figurine (N = 44, 15.8%). In addition, some children did not respond at all to this question (N = 15, 5.4%), responded by saying "I don't know" (N = 13, 4.7%), gave an irrelevant answer (N = 5, 1.8%), pointed to both figurines (N = 5, 1.8%), or pointed to the music box (N = 2, 0.7%). These 40 children were excluded from the following analysis (N = 238). We examined whether children responded correctly or incorrectly using a logistic regression model. In Model 1 we added the factors Consistency, Order, and Age. This model was significant v 2 (4) = 26.68, p < .001, R 2 = .17 (see Table 3, Model 1), revealing significant main effects of Consistency (p < .001) and Order (p = .003). Testing the hypothesis that children would be sensitive to the order of information when the information from the two sources was inconsistent but not when it was consistent, planned post hoc regressions of the simple effects of Order within each consistency condition confirmed that the order of the information children received strongly influenced their predictions when the information was inconsistent, v 2 (1) = 10.83, p < .001. In contrast, and as expected, order did not matter when children received consistent information, v 2 (1) = 0.30, p = .581. Inspection of Figure 6 shows that a smaller percentage of children identified the functioning figurine correctly if the inconsistent claim had followed (58% responded correctly, not significantly above chance performance (50%), one-sided binomial test, p = .245) rather than preceded their observation of which figurine worked (85% responded correctly, significantly above chance performance (50%), one-sided binomial test, p < .001). By implication, when observation preceded a conflicting testimony, children were  uncertain, or split, in their decision to trust either source. When observation followed conflicting testimony, children relied on observation.
In Model 2, we investigated whether Exploration influenced children's predictions, but this was not significant v 2 (1) = 1.40, p = .237 (see Table 3, Model 2). Moreover, subsequent analyses, described in Supporting Information, revealed no significant interactions between Consistency, Order, Age, and Exploration.
In sum, when making a prediction about which figurine would make the music box play, children who received testimony that was inconsistent with what they observed were more swayed by that testimony if it was the most recent piece of information they received.

Discussion
We gave children the opportunity to observe how a music box worked. By observing a naive experimenter place two different colored figurines on the box (one after the other), 3-, 4-, and 5-year-old children obtained perceptual evidence about which of those two figurines turned on the box or did not turn it on. Children also received additional evidence in the form of verbal testimony from a different, apparently knowledgeable experimenter who either confirmed or contradicted what children observed. Whether children received such testimony before or after seeing the na€ ıve experimenter interact with the music box was counterbalanced.
We wanted to know whether, when subsequently asked to sort additional figurines into those that worked and those that did not work, children would be more likely to spontaneously explore the effectiveness of the figurines before sorting them if the testimony conflicted with, rather than confirmed, what they had observed. In addition, we asked if such exploration of the figurines would influence children's sorting. Finally, we were interested in whether children who received conflicting information were influenced by the order in which they had received the testimony and observed the functioning of the figurines, as reflected in their exploration, sorting, and predictions regarding the figurines.
Children's exploration of the figurines Strikingly, the majority of preschool children did not spontaneously seek additional information about the figurines by placing them on the music box-even when the testimony they received directly conflicted with what they had seen. Thus, children were equally unlikely to explore the figurines whether the testimony was consistent or inconsistent with their observation. What might explain preschooler's limited exploration? Any answer to this question must first consider whether children recognized the conflict between the two sources of data. Preschoolers' low frequency of exploration might reflect an overriding confidence in their firsthand observations (or alternatively, an overriding confidence in the testimony). However, if this were true, then the influence of receiving conflicting testimony on children's sorting should have been confined to systematic sorting based either on the observed evidence or alternatively on the testimony. Instead, when children received testimony that was inconsistent rather than consistent with what they observed, it clearly impacted their sorting of the figurines, with more children sorting against what they observed after having received inconsistent compared to consistent information. Hence, it is plausible to assume that children recognized the conflict between the verbal testimony and the perceptual evidence.
Could it be that children lacked the knowledge of how to determine whether or not the figurines worked, or believed they were not allowed to touch them? Neither interpretation seems plausible. Recall that children had just seen the na€ ıve experimenter place the figurines on the box, and were asked to sort the figurines by this na€ ıve experimenter, not by the apparently knowledgeable experimenter who told them about which figurine worked. In principle, the fact that the person who asked children to sort the figurines had tried them out on the music box as children watched, and also implied that exploring the figurines would provide information about how the music box worked by saying "I don't know how it works, but it looks like you can put these pieces on this thing here [placing a hand on the center of the music box]. I wonder what will happen if I do that!", should have primed children's exploration if they conceptualized such exploration as informative. Furthermore, the na€ ıve informant left the table and sat faced away from the child during the task, attempting to remove potential social pressure not to question the informant's testing. However, given that testing the figurines would have been audible to the nearby na€ ıve experimenter, future studies may be useful to assess whether additional children would have tested the figurines if left in total privacy.
Children's lack of differential exploration following inconsistent as opposed to consistent evidence is all the more surprising given that, for children in the inconsistent information conditions, the two sources of data they received were in explicit and direct conflict: The testimony stated that one figurine worked and the other did not, whereas children's observation of the na€ ıve experimenter's actions showed exactly the opposite pattern. In , Ronfard and Lane (2019), and Ronfard, € Unl€ utabak, et al. (2020), preschool children did not receive a demonstration of how to empirically examine the claim they heard, whereas in Hermansen et al. (2021, experiment 1) children were told, but not shown how to interact with the figurines. In this study, however, children were effectively shown how to empirically examine the figurines. Thus, it seems plausible that children's failure to seek additional information is attributable to a failure to anticipate that additional empirical evidence would help resolve the conflict between what they heard and what they saw and thus allow them to complete the task they had been given.
This interpretation may initially seem at odds with past work demonstrating that preschoolers explore in a selective fashion following surprising and ambiguous evidence (e.g., Baldwin, Markman, & Melartin, 1993;Schulz & Bonawitz, 2007). However, a key feature of the current work was to present children with two equally compelling, yet inconclusive pieces of evidence, and assess the extent to which they sought further evidence when doing so would improve their performance on an assigned task. In principle, this overarching task should have motivated children to examine the figurines because this would have resolved the tension between the two conflicting pieces of information. Admittedly, the requirements of this task may also explain why a large majority of children did not strategically investigate: children may have been so focused on the sorting task itself that they neglected to respond to the uncertainty presented to them via the two sources. Nonetheless, this implies that, in the absence of adult guidance, preschoolers pursuing a specific goal do not think of resolving a conflict via further empirical investigation. This suggests a difference between exploratory play and goal-directed empirical investigation-particularly when more information is needed to resolve a conflict between different sources.
Children's sorting of the figurines When asked to sort the figurines according to whether or not they could make the music box work, children's performance was significantly influenced by whether or not they had been given testimony by the knowledgeable experimenter that conflicted with the perceptual evidence provided by the na€ ıve experimenter. This influence of conflicting testimony was unaffected by the order of the testimony, namely whether the testimony had come before or after children had witnessed the machine work. However, if subsequently, children spontaneously placed both figurines on the music box and thereby gathered firsthand evidence about which figurines made the toy work, they were significantly more likely to reject the conflicting testimony, consistent with prior research demonstrating the informativeness of children's exploration (Cook et al., 2011;Lapidow & Walker, 2020).
Contrary to our initial prediction, there was no effect of the order of information on children's sorting patterns. Although studies directly assessing the impact of order of information have been lacking, prior work had suggested that preschoolers may be better able to reassess their initial trust in an informant when they receive conflicting evidence later (Bridgers et al., 2016;Hermansen et al., 2021;Scofield & Behrend, 2008), rather than when evidence is followed by countertestimony Ma & Ganea, 2010). However, one important difference between those studies and this study is that they mainly assessed children's belief revisions by querying children about which source to weigh more heavily (e.g., asking: "Where should I look for X?", "Which X should I use?"). The current sorting task provided a nonverbal measure of how children weigh two conflicting pieces of information. Past research has shown radically different responses on verbal and nonverbal tasks in the same domain, for example, theory of mind (Grosse Wiesmann, Friederici, Singer, & Steinbeis, 2016;Oktay-Gur, Schulz, & Rakoczy, 2018). Indeed, contrary to the pattern observed for sorting, different patterns of responses based on order were observed on the final prediction task, an explicit, verbal measure, which we discuss later.
Although the order of information had little impact on children's sorting patterns, information consistency had a prominent effect on children's decision to sort either in accordance with or in opposition to, their observation. This effect of consistency depended partially on age, but not entirely. Older children appeared better able to draw inferences from the data they observed (i.e., systematically sort the figurines according to the observed evidence), compared to younger children. This is consistent with prior work showing that, as children get older, they become more trusting of their own observations (Bernard, Harris, Terrier, & Cl ement, 2015), require less evidence to draw inferences in other domains, for example, when reasoning about traits and intentions (Boseovski, Chiu, & Marcovitch, 2013), and are better able to monitor their own belief revision, articulating when and how their beliefs change (Taylor, Esbensen, & Bennett, 1994). However, regardless of age, some children did defer to the testimony. Although we did not examine what causes these individual differences, prior research suggests that socialization may play a role (Gelman, 2009;Tagar, Federico, Lyons, Ludeke, & Koenig, 2014).
In addition to these effects of information consistency and age on children's sorting, children's exploration also had a substantial impact on sorting. When children gathered evidence for themselves by placing both types of figurines on the music box, they were less likely to defer to the conflicting testimony they had received. Thus, they were more likely to sort in line with the perceptual evidence provided by the na€ ıve experimenter but less likely to sort in line with the verbal testimony provided by the knowledgeable experimenter.
Children's explicit reasoning about which figurines would make the music box work When predicting which figurine to use in a future attempt to make the music box work, children's replies revealed that the effect of information consistency was robust and subject to a recency bias. Children who had received testimony contradicting what they had seen were less likely to pick the figurine that had the same color as the one they had observed to work than children who received testimony consistent with what they had observed. Moreover, children who received inconsistent information were more likely to reply in accordance with the informant's incorrect testimony if this was the later of the two pieces of information they had received. Recent language studies of toddlers (Sumner, DeAngelis, Hyatt, Goodman, & Kidd, 2019), and preschool children (Mehrani & Peterson, 2017) have revealed a strong recency bias in children's replies to forced choice questions. Our results indicate that there may also be a recency bias in children's replies when questioned about a future event, and not just when learning new labels or selecting between two explicitly labeled categories. Although the recency bias has previously been found to be stronger among younger (2-to 4-yearolds), than older (4-to 6-year-olds) children (Mehrani & Peterson, 2017), this was not the case in this study. Importantly, given that the prediction question used in this study was not phrased as a forced-choice question, children could not simply repeat the latest word or phrase they had heard, but were required to generate a reply based on their memory of prior information.
Prior studies contrasting a single informant's testimony against either visible evidence or children's prior intuitions about a topic have reported mixed results in terms of whether or not young children are able to accurately discard the claim of a misleading informant in light of prior intuitions (Jaswal, 2010;Jaswal et al., 2010;Ma & Ganea, 2010), or more recent evidence (Bridgers et al., 2016;Hermansen et al., 2021;Scofield & Behrend, 2008). This is in contrast to studies directly comparing information from two informants, one right and one wrong, where children show indications of selective trust already in the second year (e.g., Koenig & Harris, 2005). An underlying-and plausible-assumption of the former single informant studies is that children's decision to trust an incorrect informant for further information reflects their actual trust in that informant over the evidence. However, this study suggests that children's responses could be affected by limited cognitive capacity-children are more likely to endorse the information they most recently received. Indeed, studies presenting 3-year-old children with (counter)evidence prior to an informant's claim have shown that they are prone to trust this incorrect informant over the counterevidence they previously observed Ma & Ganea, 2010), and will do so even when the informant is wrong multiple times (Jaswal, 2010). In contrast, studies that have first presented 3-year-old children with an informant's claim and then with counterevidence, have found that children are more likely to discard the incorrect informant's prior claim, and respond in line with the more recent evidence (Hermansen et al., 2021). The findings of this study offer an explanation for these findings-order matters, at least when children are asked to respond verbally. Thus, it is critical for future studies investigating how children weigh testimony against their firsthand experience to counterbalance the order of presentation of that evidence, especially if they ask children to weigh that information through verbal rather than nonverbal means.

Conclusions
Verbal testimony extends children's learning beyond their immediate perception. However, given that some of the information children receive via testimony may be incorrect, do children acquire strategies to distinguish trustworthy from misleading claims? We asked whether preschool children strategically seek further information about a music toy, following inconsistent information about it from two different sources-a verbal claim and an observation-and whether their decision to seek further information would be affected by the order in which they received the information. In an attempt to indirectly increase children's motivation to resolve the tension between the two sources of information, they were asked to sort a set of novel figurines according to whether or not they could make the music toy play. Although seeking out additional information would have improved children's performance on this sorting task, preschool children rarely seized the opportunity to do so. Importantly, this lack of exploration was not a result of children either disregarding or endorsing the contradictory testimony. Rather, children's sorting of the figurines revealed a sensitivity to the presence of epistemic uncertainty, together with a lack of insight into the value of seeking additional information to resolve uncertainty when pursuing an explicit goal. Thus, this study is consistent with the proposal that the majority of preschool children do not seize opportunities to engage in structured empirical investigations with the explicit purpose of resolving conflicts that arise from surprising testimony (e.g., . Critically, the children who did seize such opportunities were able to learn from the empirical data they gathered and resisted the lure of misleading testimony in their sorting-but they were a minority.