First demonstration of effective spatial training for near transfer to spatial performance and far transfer to a range of mathematics skills at 8 years

Abstract There is evidence that spatial thinking is malleable, and that spatial and mathematical skills are associated (Mix et al. [2016] Journal of Experimental Psychology: General, 145, 1206; Mix et al. [2017] Journal of Cognition and Development, 18, 465; Uttal et al. [2013] Psychological Bulletin, 139, 352). However, few studies have investigated transfer of spatial training gains to mathematics outcomes in children, and no known studies have compared different modes of spatial instruction (explicit vs. implicit instruction). Based on a sample of 250 participants, this study compared the effectiveness of explicit and implicit spatial instruction in eliciting near transfer (to the specific spatial skills trained), intermediate transfer (to untrained spatial skills) and far transfer (to mathematics domains) at age 8. Spatial scaling and mental rotation skills were chosen as training targets as previous studies have found, and proposed explanations for, associations between these skills and mathematics in children of this age (Journal of Experimental Psychology: General, 145, 2016 and 1206). In this study, spatial training led to near, intermediate and far transfer of gains. Mental visualization and proportional reasoning were proposed to explain far transfer from mental rotation and spatial scaling skills respectively. For most outcomes, except for geometry, there was no difference in the effectiveness of implicit (practice with feedback) compared to explicit instruction (instructional videos). From a theoretical perspective, the study identified a specific causal effect of spatial skills on mathematics skills in children. Practically, the results also highlight the potential of instructional videos as a method of introducing spatial thinking into the classroom.

size of almost one half a standard deviation for training studies that compared spatial training to control conditions (Hedges G = 0.47). The effect size increased to 0.61 (Hedges G) when the analysis was limited to studies of children under 13 years, demonstrating the particular malleability of spatial thinking in childhood (N = 53 studies). Note that similarly to Cohen's d, Hedges G values of 0.2, 0.5 and 0.8 correspond to small, medium and large effects respectively (Cohen, 1988). There is also convincing evidence that spatial and mathematical thinking are associated longitudinally in childhood. For example, spatial thinking measured using the Test of Spatial Assembly [TOSA]) at 3 years predicts 27% of the variation in mathematics problem solving at 5 years (Verdine et al., 2014), and pattern construction skills at 5 years explain approximately 9% of the variation in mathematics performance at 7 years (Gilligan, Flouri, & Farran, 2017).
However, the literature does not support a simple linear coupling between all aspects of spatial and mathematical cognition (Fias & Bonato, 2018). There is evidence that spatial-mathematical relations are specific to certain spatial and mathematics tasks and that these relations may differ across development. Gilligan, Hodgkiss, Thomas, and Farran (2018) measured the relationship between four different spatial sub-domains and mathematics.
They found that spatial scaling (or the ability to transform distance information from one representation to another representation of a different size; Frick & Newcombe, 2012) was the strongest spatial predictor of standardized mathematics performance in 6-10 year olds when compared to perspective taking, disembedding and mental rotation. Mental rotation had an age-dependent role for 6-8 year olds only . Similar age-dependent findings were reported by Mix et al. (2016Mix et al. ( , 2017 who found that mental rotation was a significant predictor of mathematics performance at 6 and 9 years but not at 11 years. Frick (2019) also reported that, in comparison to other spatial skills (diagrammatic representation, cross-sectioning, mental transformation and perspective taking), spatial scaling and mental rotation at 6.5 years explained at least 24% of the variation in mathematics performance at 8.5 years. This included both arithmetic items and items assessing numeric-logical and spatial functions (e.g. number sequences, counting magnitudes, counting cubes, estimating line lengths; Frick, 2019). Taken together, the selection of spatial subdomains for training studies should reflect the facts that (a) not all spatial skills are equally associated with all mathematics outcomes and (b) spatial-mathematical associations are developmentally sensitive.
Mental rotation and spatial scaling were targeted for training in this study. As outlined, these skills have previously been associated with mathematics achievement in children aged 6-9 years.
Furthermore, underlying cognitive mechanisms have been proposed that may explain associations between these spatial skills and mathematics outcomes (e.g. Gilligan et al., 2018;Mix et al., 2016Mix et al., , 2017. These proposed underlying mechanisms influenced not only the selection of training targets, but also the selection of mathematics measures for inclusion in this study. Specifically, mental rotation is proposed to elicit active processing, including mental visualization and manipulation of objects (Lourenco, Cheung, & Aulet, 2018;Mix et al., 2016). Thus, mental rotation training may have benefits for mathematics tasks requiring the mental manipulation or organization of numbers, for example, complex mathematical word problems or multidigit calculations (Lourenco et al., 2018). Missing term problems were included in the task battery of this study as mathematics tasks of this type require mental manipulation of numbers. In contrast, spatial scaling is proposed to elicit intensive quantification skills (proportional reasoning). Thus, spatial scaling training may improve performance on mathematics tasks that require proportional reasoning, for example, number line estimation and geometry performance (Newcombe, Levine, & Mix, 2015;Rouder & Geary, 2014). For this reason, both number line and Geometry Tasks were included in the task battery of this study.
This study included participants aged approximately 8 years. As outlined above, there is evidence of significant spatial-mathematics relations at this age. Furthermore, as described in the next section, this age range overlapped with other spatial training studies that investigated transfer of gains to mathematics (Cheng & Mix, 2014;Hawes, Moss, Caswell, & Poliszczuk, 2015). Thus, the inclusion of participants aged 8 years allowed for meaningful comparisons between this, and previous studies. Additionally, children of this age were deemed old enough for independent computer-based training.

| Evidence of transfer of spatial training gains to mathematics
Spatial interventions that integrate spatial thinking into mathematical instruction report gains in both spatial (near and intermediate transfer) and mathematical outcomes (far transfer; Hawes, Moss, Caswell, Naqvi, & MacKinnon, 2017;Lowrie, Logan, & Ramful, 2017). However, these studies cannot offer insight into the underlying causal relationship between spatial and mathematical domains, as it is not possible to disentangle the impact of the spatial, and

Research Highlights
• Both explicit instruction (instructional videos) and implicit instruction (task practice with feedback) elicited gains in spatial performance at 8 years.
• Training spatial skills led to near, intermediate and far transfer of gains, even after controlling for expectation and engagement effects.
• Mental visualization and proportional reasoning were proposed to explain far transfer from mental rotation and spatial scaling skills, to mathematics respectively.
• The transfer of spatial training gains from spatial to mathematics sub-domains provides evidence for a causal influence of spatial thinking on mathematics performance. mathematical aspects of training respectively. Few studies have investigated transfer of gains from spatial training (with no mathematical component) to mathematics. Cheng and Mix (2014) reported significant gains in mental rotation (near transfer) and mathematical calculation (far transfer) following 40-min of mental rotation training in 6-8 year olds, compared to a control group. Gains were specific to missing term arithmetic problems, for example, 4 + __ = 9. In a similar mental rotation training study of 6-8 year olds, Hawes et al. (2015) failed to replicate these findings with respect to far transfer. Improvements in mental rotation (near transfer) and mental transformation (intermediate transfer) were reported for the training group who completed 15 sessions of computerized mental rotation training, compared to controls. However, no improvements in mathematics skills including non-verbal arithmetic or missing term arithmetic problems were found for either group .
These differing results may be explained by several factors.
First, Cheng and Mix (2014) delivered training in small groups (3-4 children) supervised by a researcher, while Hawes et al. (2015) administered classroom (group) training without direct supervision.
Without the supervision of a researcher, reduced engagement with training may have contributed to the results of the  study. Second, post-testing was delivered immediately following training by Cheng and Mix (2014), while Hawes et al. (2015) delivered post-testing 1 week after training. Thus, caution must be taken in assuming that the gains reported by Cheng and Mix (2014) are durable. Third, the training method differed between the two studies. Implicit instruction was used by Hawes et al. (2015). Points were awarded for correct trials, but no instructions were given to explain correct (or incorrect) answers. In contrast, Cheng and Mix (2014) used explicit instruction, by giving participants physical manipulatives (mirroring those included in the onscreen trials) and instructing them to move the shapes to check their answers.
Differences in the training modes used in the above two studies reflect a broader distinction between explicit and implicit instruction types. In this study, implicit instruction is defined as instruction in which students are not aware of learning and use their experiences to construct an understanding. In contrast, for explicit instruction, the instructor plays a key role in explaining concepts to students and the student is aware of the skill or knowledge being taught. While there is mixed evidence regarding the effectiveness of explicit and implicit instruction in learning more generally (Kirschner, Sweller, & Clark, 2006), to our knowledge, no spatial training studies compare the efficacy of implicit and explicit instruction. Most studies of children have demonstrated the effectiveness of spatial training using implicit training, for example where participants complete task practice with feedback (Uttal et al., 2013). Instructional videos are one tool that can be used to deliver explicit instruction. There is evidence that viewing an instructional video of successful task completion can improve subsequent performance in number line estimation and spatial cross-sectioning in adults (Cohen & Hegarty, 2014;Gallagher-Mitchell, Simms, & Litchfield, 2018). The success of instructional videos may be attributable to observational learning (Castro-Alonso, Ayres, & Paas, 2014;Paas & Sweller, 2012). In particular, for spatial thinking, instructional videos may activate the mirror neuron system as individuals imagine movements (Rizzolatti & Sinigaglia, 2010;Tettamanti et al., 2005). From a practical perspective, instructional videos could offer a novel, practical method of introducing spatial thinking into the classroom. To maximize the consistency of explicit instruction in this study, instructional videos were used.
However, explicit instruction delivered by an individual, for example, a teacher or other expert, may have differing results and is not explored in this study.
Another factor that is not often considered in training studies, but that is controlled for in the current study, is the role of motivational factors. First, expectation (placebo) effects occur when the expectation that training will be effective induces cognitive gains, independently from the training content (Green et al., 2019). The placebo effect is well documented in medical domains with some limited evidence that expectation effects play a role in cognitive psychology studies (Dweck, 2000;Foroughi, Monfort, Paczynski, McKnight, & Greenwood, 2016;Jaeggi, Buschkuehl, Shah, & Jonides, 2014). By controlling for expectation effects, the causal inferences made in this cognitive training study are enhanced (Boot, Simons, Stothart, & Stutts, 2013). The degree to which participants engage with training is also proposed to impact training outcomes. For example, differences in participant engagement may explain the contrasting findings reported by Cheng and Mix (2014) and Hawes et al. (2015). In adult studies, those who show higher levels of engagement with cognitive training exhibit larger gains (Jaeggi et al., 2014). By controlling for participant engagement, the rigour of this study is substantially stronger, as it was possible to determine the extent to which cognitive training gains are attributable to training, over and above differences in participant engagement.

| Current study
This study compared explicit and implicit instruction methods as means of training spatial skills in children aged 8 years and explored transfer of spatial training gains to other spatial and mathematics domains. Explicit instruction was delivered using instructional videos which were designed for use in this study. The choice of spatial scaling and mental rotation as spatial training targets was supported by both theoretical and behavioural evidence. The effectiveness of the intervention was assessed in the context of near, intermediate and far transfer of gains. A further original aspect of this study is that motivational factors including engagement with, and expectations of spatial training were controlled for.

| Participants
The sample size for this study was determined using GPower. The power analysis was based on the largest analysis completed in this study (3 × 2 × 2 ANOVA). To achieve power of 0.8, with a medium effect size (f = 0.25), power analysis indicated that a minimum of 158 participants were required. As the study design included data collection at two-time points, it was anticipated that there would be some participant drop-off between Time 1 and Time 2. Therefore, the sample size was increased to account for possible attrition of the sample. Participants were 250 children from six primary schools across London, UK. All participants were in Year 3 (M age = 8.09 years, SD = 0.41 years). The overall proportion of males (48%) and females (52%) was approximately equal. Participant demographics across training groups are shown in Table 1.

| Study design
As shown in Figure 1, this study used a randomized, controlled, pre-post training design. All participants completed an identical battery of tasks 1-week pre-training ± 1 day (Time 1), and immediately (within 5 min) post-training (Time 2). All tasks and training procedures were computer-based and were delivered using Gorilla software (www.goril la.sc). Participants completed testing in their school IT suites in groups of 6-8 participants supervised by at least one (but typically two) researchers. All task instructions were incorporated into the Gorilla software and were presented to participants using earphones. Participants moved through the task battery at their own pace. Data collection was completed over a 7-month period (April-October).

| Training procedures
Training groups differed by training mode (explicit vs. implicit) and training type (mental rotation vs. spatial scaling vs. control). For both implicit and explicit instruction, training lasted between 3 and 4 min. For implicit instruction, the length of training was dependent on participants' performance (i.e. the speed taken to complete the items). For some participants in the implicit instruction group, training lasted up to 6 min.
This combination of two possible training modes and three possible training types led to six groups. Participants were randomly assigned to a group immediately preceding training (see Table 1). Allocation was completed using the balanced randomization function on the Gorilla software. The total number of predicted participants was entered into the software before data collection (N = 240). As this study has six training groups, a ratio of 40:40:40:40:40:40 participants in each group was assumed. Assignment using balanced randomization in Gorilla is like a weighted dice roll. This means that the first participant to complete the study had a 40/240 chance of being assigned to each group. However, if for example participant 1 was assigned to groups using unbalanced randomization, that is, the probability that they were assigned to each group was 1/6 and was not dependent on the assignment of prior participants.

| Explicit training
Three of the training groups viewed instructional videos that provided explicit task instructions. Two groups watched videos with spatial content, while the control group watched a video on word reading. The videos were designed using Vyond (www.vyond.com).
All non-training content was uniform across videos, for example, the characters, storyline and narration. The videos can be accessed using the links provided below. Group 1 viewed the instructional mental rotation video. Participants in this group were given a description and viewed eight examples of mental rotation (see Figure 2 for a screenshot). For more details go to https ://youtu.be/18iyR svt-GAQ. Group 2 viewed the instructional scaling video, in which a description of spatial scaling, and eight examples of spatial scaling were shown (see Figure 3). For more details go to https ://youtu.be/grhxF Eqgz51. For Group 3, the control video was shown. Participants watched eight examples of word-picture matching, in which the onscreen characters selected the correct picture to match a given word (see Figure 4). Participants allocated to this control group did not view any spatial-related content. For more details go to https :// youtu.be/qDmgR R2RLyE.

| Implicit training
The three implicit training groups completed task practice with computer-based feedback. For each trial, participants were shown an onscreen tick or cross indicating the accuracy of their response.
For incorrect trials, participants were given the opportunity to repeat the trial until they had selected the correct answer (all tasks had two possible response options). Participants were not given any explicit instruction on how to complete the trials. Participants moved to the next trial when the correct response was selected.
For implicit training, two groups completed spatial tasks (the same tasks presented at Time 1), while a control group completed a word reading task. The number of trials included in implicit training was determined as the approximate number of trials that could be completed in the same length of time as the explicit instruction. This was established through piloting. Group 4 completed implicit mental rotation training and were presented with 30 trials of the Mental Rotation Task with feedback (further details of this task are outlined below). Group 5 completed implicit scaling training comprising of 24 trials of the Spatial Scaling Task (further details of this task can be found below). Feedback was given for each trial. Group 6 completed implicit control training. These participants completed 30 trials of a Word-Picture Matching Task in which they were asked to match a word to one of two pictures using labelled keys on the keyboard (see Figure 5). This was a reading task requiring minimal spatial skills.
Feedback was provided.
F I G U R E 2 Screenshot taken from the instructional video of mental rotation (explicit instruction) F I G U R E 3 Screenshot taken from the instructional video of spatial scaling (explicit instruction) F I G U R E 4 Screenshot taken from the control instructional video (explicit instruction)

| Task battery
The task battery included two spatial measures, assessing mental rotation and spatial scaling respectively. These measures were included as potential targets of near transfer (spatial tasks trained on) and of intermediate transfer (untrained spatial tasks). Three mathematics measures were included in the task battery as potential targets for far transfer (missing term problems, a Number Line Estimation Task and a Geometry Task). The order of task presentation was randomized across participants at both time points. To assess the role of motivational factors, two participant engagement measures were also administered.

| Mental Rotation Task
In each trial of the Mental Rotation Task participants were required to identify which of two animal images located above a horizontal line matched the target image below the line. As shown in Figure

| Spatial Scaling Task
The Spatial Scaling Task was modified from Möhring, Newcombe, and Frick (2016). In each trial participants were shown two 2D images of a circular space (a farmer's field) containing a target (an egg).
Participants were asked to identify whether the eggs in the two fields were in the same position or in different positions (see Figure 7). For half of the trials, the targets were presented in the same position in both fields (match trials). For the remaining trials, the position of the F I G U R E 5 Sample trial from control training (implicit instruction) F I G U R E 6 Sample stimulus from the Mental Rotation Task (45° anti-clockwise trial) F I G U R E 7 Sample mismatch trial at a scaling factor of 0.875 from the Spatial Scaling Task (taken from Möhring et al., 2016) target in one field was adjusted by 2 cm (to the left or right) relative to the second field (mismatch trials

| Missing term problems
The missing term problems included in this study were modified from Hawes et al. (2015). For each item participants were required to complete the missing number(s) in a simple mathematical equation (see Figure 8). This task included two practice items where the solutions were shown after participants submitted an answer.
Following this, 21 test items were displayed. No solutions were shown for these items. Test items included the original 18 items from Hawes et al. (2015) and three additional, low-difficulty items that were added to the task after piloting to alleviate floor effects.
Items were presented in order of increasing difficulty and a time limit of 25 s was allocated to each test item. Approximately equal numbers of addition versus subtraction items, and single versus multi-digit numbers were included. The position of the missing box was also balanced across items. Performance accuracy was recorded.

| Number Line Estimation Task
The Number Line Estimation Task was used to measure numerical representations. The method was adapted from Siegler and Opfer (2003). As shown in Figure 9, for each item participants were presented with a target number and were asked to estimate

| Geometry Task
The Geometry Task was designed for this study based on the statutory geometry learning requirements for Year 2 students in the UK (Department of Education, 2013). The task included two item types, Shape Items and Symmetry Items. For Geometry Shape Items, participants were shown an image of a shape and were asked to select the correct number of sides (or faces) on the shape from four possible response options (see Figure 10). Participants completed a single practice item using a 2-D shape on which they were given feedback.
All participants successfully completed this item. Geometry Shape F I G U R E 8 Sample missing term problem F I G U R E 9 Sample item from the Number Line Estimation Task F I G U R E 1 0 Sample 3-D shape item from the Geometry Task Items differed in the dimensionality of the images shown and included six 2-D shapes and six 3-D shapes. Performance was measured as accuracy across all items.
For each Geometry Symmetry Item, a target shape was displayed on screen and participants were asked to select which of four possible response options was the mirror image of the target shape (see Figure 11). Participants completed a single practice trial in which they received feedback. Ten experimental Symmetry Items were presented in a randomized order. For each item, the distractor images included a match error, a shape error and a symmetry error (see Figure 11). For match errors, the distractor was identical in both shape and position to the target shape (a). For shape errors, the distractor was in the correct position, however the shape was not a mirror of the target image, but another similar shape (b). Finally, for symmetry errors the distractor was the correct shape however the position of the distractor was not an accurate mirror image (c). Performance accuracy was recorded.

| Expectations of the effectiveness of training
Prior to the delivery of training, all participants were asked a single question, measuring their expectations of the effectiveness of training, 'We are going to be playing some games. How much do you think the games will help you with your maths?'. The question was displayed alongside an onscreen scale (see Figure 12). Participants responded by selecting a point on the scale using the mouse cursor.
Participant's responses were coded as 1-12 based on the onscreen position selected. A score of 1 was allocated for responses that indicated low expectations of training while a score of 12 was allocated for responses that indicated high expectations of training.

| Participant Engagement Questionnaire
A participant engagement questionnaire was delivered to assess participant's enjoyment of and engagement with the training that they F I G U R E 11 Sample Geometry Symmetry Item showing a match error (a), a shape error (b), a symmetry error (c) and the correct answer (d) How much effort did it take to watch the video?
How much effort did it take to play the game? had received. The questionnaire was designed for use in this study.
As shown in Table 2, the questionnaire included four questions, the phrasing of which varied slightly based on the type of training delivered. Each question was presented alongside an onscreen scale (for an example see Figure 13). Participants responded to each question by selecting a point on the scale using the mouse cursor. Participant's responses were coded as 1-12 based on the onscreen position selected. A score of 1 was allocated for responses that indicated low engagement while a score of 12 was allocated for responses that indicated high engagement. Participants were awarded an overall engagement score, an average of their scores across all four questions (where necessary items were reverse coded).

| Exclusion criteria
Due to technical errors and school disruptions, data for a single task was lost for nine participants at Time 1 and 15 participants at Time 2. These participants were excluded from training analysis for the task on which they were missing data. Furthermore, participants scoring higher than 95% on a given task at Time 1, were deemed to have reached "ceiling level" performance on the task and were excluded from training analysis for that task only. For missing term problems and Number Line Estimation, responses were open ended. For missing term problems, participants who did not score higher than 10% at Time 1, were not deemed to understand the task aims and were excluded (n = 14). For Number Line Estimation participants who didn't attempt at least 75% of items, or participants with a mean PAE score higher than 15% for practice items were also excluded (n = 0). Parametric analyses were used as all groups were large enough (N > 30) for the central limit theorem to apply (Field, 2013).

| Overall performance at Time 1
No ceiling or floor effects were present for any measures (Table 3).
Descriptive information for performance on each of the tasks, across groups is shown in

| Differences in task performance across training groups at Time 1
To confirm that there were no performance differences between groups at Time 1, a two-way ANOVA was completed for each task.
Training mode (2 levels: explicit vs. implicit) and training type (3 levels: mental rotation vs. spatial scaling vs. reading) were included as between participant variables. Comparing across training types and training modes, no significant differences in performance were reported for any of the mathematics or spatial tasks (p > .05, 2 p < 0.010; see Table 6). Similarly, there were no differences in expectations of training across training modes, F(1, 244) = 3.25, p = .072, 2 p = 0.013, or training types, F(2, 244) = 0.27, p = .763, 2 p = 0.002.

| Associations between measures at Time 1
Pearson correlations were completed between measures at Time 1. This allowed for the investigation of whether the observed TA B L E 4 Gender differences in task performance at Time 1 associations between spatial and mathematics skills that have been demonstrated in previous studies (e.g. Gilligan et al., 2018;Mix et al., 2016) and form the rationale for the training paradigm used in this study, were present. As shown in Table 5, significant correlations were reported between all tasks, except for performance on Geometry Shape Items which was not correlated with mental rotation accuracy, r(248) = 0.09, p = .147. Expectations of the effectiveness of training were not correlated with performance on any behavioural measures.

| Performance at Time 2
Mixed ANOVAs were used to investigate training effects across near, intermediate, and far transfer measures (see Table 6 for a summary of performance scores across Time 1 and Time 2). Time was included as a within participant variable (Time 1 and Time 2). Training mode (explicit vs. implicit) and training type (mental rotation vs. spatial scaling vs. control) were included as between participant variables. Where sphericity could not be assumed,

Greenhouse-Geisser values were reported. It is noteworthy that
ANCOVAs with Time 2 scores as the dependent variable and Time 1 scores as a covariate were run in parallel to these analyses.
Comparable results were reported for all outcomes. Further details, including comparisons between training types at Time 2, can be found in the Supporting Information.

Spatial scaling
A significant main effect of training type was found, with higher performance for spatial scaling training compared to the other training types, F(2, 232) = 8.28, p < .001, 2 p = 0.067. There was also a significant interaction reported between time and training type, F(2, 232) = 6.25, p = .002, 2 p = 0.051 (see Figure 15). Paired sample t tests indicated significant performance gains following spatial scal-

Missing term problems
A significant interaction between time and training type was found, F(2, 209) = 4.58, p = .011, 2 p = 0.042 (see Figure 16). Paired sample t tests indicated a significant improvement in accuracy

Number Line Estimation
As a significant gender effect was reported for PAE scores on this task at Time 1, gender was included as a between participant variable. However, no significant main effect or interactions with gender were reported for this task (ps > .05, 2 p s < 0.014). Hence, gender was removed, and the analysis was repeated. A significant main effect of time was reported, F(1, 237) = 5.86, p = .016, 2 p = 0.024. There was also a significant interaction between time and training type.  by expectations of training. A separate ANCOVA was completed for each training type group (mental rotation, spatial scaling and control) and each training mode group (explicit and implicit). Time was included as a between participant variable and expectation score was included as a covariate. There were no significant interactions between participant expectations of training and time for any of the training types (ps > .05, 2 p s < 0.033) or any of the training modes (ps > .05, 2 p s < 0.012).

Participant engagement with training
An ANOVA was completed with training type and training mode as between participant variables and self-reported engagement levels as the dependent variable. There was a significant difference in engagement across training types, F(2, 244) = 3.37, p = .036, 2 p = 0.027. Bonferroni pairwise comparisons indicated significantly higher engagement levels following control training compared to spatial scaling training (p = .034). There was no main effect of training mode on engagement, F(1, 244) = 1.81, p = .180, 2 p = 0.007. However, there was a significant interaction between training type and training mode on engagement, F(2, 244) = 3.30, p = .039, 2 p = 0.026. For explicit training there were no differences in engagement across training types, F(2, 123) = 0.56, p = .573, 2 p = 0.009. For implicit training there was an effect of training type, F(2, 121) = 5.42, p = .006, 2 p = 0.082. As highlighted in Figure 20, post-hoc Bonferroni tests indicated significantly higher engagement following control training compared to spatial scaling training (p = .004).

| D ISCUSS I ON
The results reported support and extend previous correlational findings on spatial-mathematical relations and provide insight into the causal relationships between different aspects of spatial and mathematical thinking. It was demonstrated that training mental rotation and, for the first time, training spatial scaling, led to gains in spatial and mathematical thinking at 8 years. These gains were present following explicit and implicit instruction. Spatial training gains had near, intermediate and far transfer effects. Spatial thinking is therefore one cognitive domain in which transfer of cognitive training gains is possible. The gains reported reflect the importance of choosing developmentally sensitive, theoretically motivated training targets.
Near transfer: Mental rotation and spatial scaling training led to significant gains in mental rotation, and spatial scaling respectively.
Findings which are consistent with previous evidence that spatial skills are malleable in children (Uttal et al., 2013). Previous studies typically investigated the malleability of mental rotation or other spatial tasks that elicit mental visualization (Uttal et al., 2013) while this is the first study to highlight the malleability of spatial scaling in children at 8 years.
Intermediate transfer: Significant gains in mental rotation were reported following spatial scaling training providing evidence of intermediate transfer of spatial scaling training to an untrained spatial task. These findings are consistent with those of Uttal et al. (2013) who found that spatial training transferred to other untrained spatial tasks. However, Uttal et al. (2013) reported that intermediate trans-  Cheng and Mix (2014) who demonstrated that explicit mental rotation training led to gains in performance accuracy on a similar missing box task. Cheng and Mix (2014), proposed that these findings are due to the fact that children solve arithmetic problems of this type by mentally rotating the terms, thus restructuring the equation in a more prototypical format. For example, 4 + __ = 9, can be mentally rotated to generate the equation __ = 9 − 4. However, this mental manipulation would require a relatively advanced understanding of calculation rules, that is, a plus becomes a minus when it is moved across the equals sign. Alternatively, children may use spatial visualizations to represent these equations pictorially. This equation could be solved by visualizing 4 blocks in one group and 9 blocks in another, and counting the difference between the groups (Lourenco et al., 2018). It is noteworthy that this study found no significant difference between explicit and implicit instruction on this task in contrast to  who did not find gains on missing term problems following implicit mental rotation training. This highlights other factors, such as participant engagement during training, as possible explanations for the results reported by Hawes et al. (2015). Another explanation for the differences reported between studies is that in this study and in Cheng and Mix (2014), a part-whole type mental rotation training was used (participants had to rotate an object and combine it with another object or picture to create a whole) which may have acted as an analog for children when solving missing term problems.
For the Number Line Estimation Task, a significant reduction in error was reported for children who completed spatial scaling training. This far transfer of gains from spatial scaling to number line estimation may be explained by the fact that both tasks require proportional reasoning. If a child was asked to place the number 27 on a number line ranging from 0 to 100, they might reason that 27 is F I G U R E 2 0 Self-reported levels of engagement following training, across training modes and training types (*p < .05, **p < .01, ***p < .001) close to 25, which is one quarter of 100. By accurately dividing the number line into quarters, a child could place the number 27 with relatively high accuracy (Newcombe et al., 2015Rouder & Geary, 2014). Proportional reasoning is also required when comparing two spaces of different sizes . Alternatively, the Mental Number Line may be responsible for associations between spatial scaling and number line estimation. This concept outlines that numbers are represented spatially in the brain with smaller numbers on the left and larger numbers on the right (Barsalou, 2008;Lakoff & Núñez, 2000). Children may scale between a mental number line and the number line presented in Number Line Estimation Tasks (see Dehaene, 1997;Fischer, 2003). Whilst spatial scaling has been associated with number line estimation in a number of studies (e.g. Gilligan et al., 2018;Mix et al., 2016), this is the first to show that spatial scaling training leads to improvements in number line estimation. To note, an unexpected increase in error was reported following control training. This may be attributable to fatigue or boredom with the task at Time 2. Further investigation is needed to understand this effect.
Performance on the Geometry Task differed across item types.

| Motivational factors
This study is the first to explore the efficacy of spatial training while

| Implications, future directions and limitations
This study provides some of the first evidence that the association between spatial and mathematical performance reflects a causal influence of spatial ability on mathematics performance. This causal relationship between spatial skills and mathematics can be inferred because a manipulation in one variable (spatial skill) led to changes in the other variable (mathematics skill; Pearl, 2000).
The findings determine that the observed correlations between spatial and mathematical thinking cannot solely be explained by a common cause acting on both variables, for example, genetic influence, IQ, language skills or other cognitive skills such as WM.
As shown in Figure 21, without a direct cause between spatial and mathematical thinking, intervening on spatial skills would not lead to changes in mathematical outcomes. Thus, while a common cause such as a general cognitive factor or neural features may also exist between spatial and mathematical thinking (Oberauer, 2016), this study identified a specific, direct causal effect of spatial skills on mathematics performance. Furthermore, these findings do not preclude a causal role of mathematical thinking on spatial skills, that is, a bidirectional relationship (feedback loop) may exist between spatial and mathematical thinking. From a practical perspective, finding novel methods of improving mathematical thinking in children is an educational priority (National Audit Office UK, 2018) and this study aimed to determine the causal effect of spatial skills on mathematics. However, to establish whether a bidirectional relationship exists between spatial and mathematics skills, future research is needed investigating the effects of training mathematics skills on spatial performance. In summary, the identification of a causal effect of spatial thinking on mathematics in this study, strengthens arguments for spatializing mathematics teaching as a means of improving mathematics outcomes (Bruce & Hawes, 2015). The instructional videos presented here offer one way of introducing spatial thinking into the classroom. However, further research is needed to explore the optimum dosage of this training and the durability of these training gains.
While most previous spatial training studies are based on mental rotation (or similar spatial tasks; Uttal et al., 2013), this study demonstrates an important role for other spatial sub-domains, particularly spatial scaling. This study highlights the importance of carefully choosing spatial training targets and suggests that training studies should be closely aligned with findings from cross-sectional and correlational analyses. Mental rotation and spatial scaling were selected as training targets in this study, as these task specifically relate to mathematics outcomes at 8 years (Gilligan et al., 2017;Mix et al., 2016Mix et al., , 2017. Future studies should explore whether spatial training using age appropriate targets might confer benefits to spatial and mathematics performance in older children, for example by training perspective taking abilities or visuo-spatial thinking which have been associated with mathematics outcomes at 10 years  and 11 years (Mix et al., 2016(Mix et al., , 2017 respectively. Furthermore, there is cross-sectional evidence that the role of spatial thinking extends beyond mathematics, to other Science, Technology, Engineering and Mathematics (STEM) domains (e.g. Hodgkiss, Gilligan, Tolmie, Thomas, & Farran, 2018;Wai, Lubinski, & Benbow, 2009). Future studies could explore transfer of spatial training gains to other STEM domains.
The results of this study should be interpreted in light of its limitations. First, there was a short interval (0-5 min) between training and post-testing. Therefore, the training completed in this study may have led to priming of certain strategies for task completion, and not conceptual change. Other studies that have shown that short-term priming is possible and effective in children. For example, 5 min of spatial priming increases creativity in children aged 6-9 years (Liberman, Polack, Hameiri, & Blumenfeld, 2012), while priming spatial language terms (5 min) improves performance on a spatial relations task at 4 years (Loewenstein & Gentner, 2005). However, even if the findings reported reflect a priming effect, the results of this study have significant practical applications for teachers, given that priming enhanced performance on mathematics performance. Alternatively, transfer of gains from spatial training to mathematical skills may reflect both priming and conceptual change. These two processes are necessarily inter-linked, as it is not possible to prime a process that you have not yet developed. Taken together, although priming cannot be ruled out, similarly to Cheng and Mix (2014), here we demonstrate shared cognitive processing in the completion of spatial and mathematics tasks, that is subject to modification through training. A second limitation of this study was that the duration of the spatial training delivered was relatively short, and there was no investigation of dosage effects. Furthermore, although far transfer of gains between spatial training and mathematical outcomes was reported, the size of these gains was relatively small. Future research is needed to investigate whether the amount of training delivered influences the size and durability of training gains. However, the findings here demonstrate that even short bouts of spatial training lead to transfer of training gains to mathematics. Importantly, the findings of other studies suggest that there is durability of spatial training gains. Uttal et al. (2013) compared spatial training studies with post-testing immediately following training, to studies that wait days, weeks or even months until post-testing. Uttal et al. (2013) found that spatial training gains were durable and that the timing of post-testing did not significantly influence the size of training gains reported following spatial training.

| CON CLUS IONS
The use of developmentally sensitive, theoretically motivated spatial training targets led to near, intermediate and far transfer of gains to both spatial and mathematical domains at 8 years. Not only do these findings highlight the malleability of spatial skills, F I G U R E 2 1 Causal relationship between spatial and mathematical thinking. Note: Established and speculative causal relations are shown in orange and grey respectively they also call attention to spatial ability as one domain in which cognitive training can lead to transfer effects. Explicit and implicit instruction led to similar gains in spatial and mathematical domains (except for geometry items). This emphasizes the potential of explicit instruction as a practical means of eliciting far transfer of spatial training gains in the primary school classroom. It is also advised that the choice of cognitive training should be constrained by an understanding of the underlying cognitive mechanisms of training targets. In this study mental visualization was proposed as an underlying cognitive mechanism for mental rotation training, and proportional reasoning was proposed as an underlying cognitive mechanism for spatial scaling training. The gains reported highlight the importance of choosing task and age sensitive targets for spatial training. In turn, evidence from this training study lays bare the causal contribution of cognitive processes to mathematical cognition that was previously only inferred based on correlational evidence.

Funding for this research was provided by the Bloomsbury Colleges
Ph.D. Scholarship Programme. This work was also supported by a Gorilla Award in Behavioural Science and the National Centre for Curriculum and Assessment (NCCA) Ireland. The authors thank Pari Patel for assistance coding number line data.

CO N FLI C T O F I NTE R E S T
The authors have no conflicts of interest to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author Katie Gilligan (k.gilligan@surrey.ac.uk) upon reasonable request.