Dynamic testing of gifted and average‐ability children's analogy problem solving: Does executive functioning play a role?

In this study, dynamic testing principles were applied to examine progression of analogy problem solving, the roles that cognitive flexibility and metacognition play in children's progression as well as training benefits, and instructional needs of 7- to 8-year-old gifted and average-ability children. Utilizing a pretest training posttest control group design, participants were split in four subgroups: gifted dynamic testing (n = 22), gifted unguided practice (n = 23), average-ability dynamic testing (n = 31), and average-ability unguided practice (n = 37). Results revealed that dynamic testing led to more advanced progression than unguided practice, and that gifted and average-ability children showed equivalent progression lines and instructional needs. For children in both ability categories, cognitive flexibility was not found to be related to progression in analogy problem solving or training benefits. In addition, metacognition was revealed to be associated with training benefits. Implications for educational practice were provided in the discussion.

In this study, dynamic testing principles were applied to examine progression of analogy problem solving, the roles that cognitive flexibility and metacognition play in children's progression as well as training benefits, and instructional needs of 7-to 8-year-old gifted and average-ability children. Utilizing a pretest training posttest control group design, participants were split in four subgroups: gifted dynamic testing (n = 22), gifted unguided practice (n = 23), averageability dynamic testing (n = 31), and average-ability unguided practice (n = 37). Results revealed that dynamic testing led to more advanced progression than unguided practice, and that gifted and average-ability children showed equivalent progression lines and instructional needs. For children in both ability categories, cognitive flexibility was not found to be related to progression in analogy problem solving or training benefits. In addition, metacognition was revealed to be associated with training benefits. Implications for educational practice were provided in the discussion.

K E Y W O R D S
dynamic testing, executive functioning, giftedness

INTRODUCTION
It has been proposed that cognitive abilities play an important role in children's school performance. Both intelligence (Roth et al., 2015), and executive functions (e.g., Monette, Bigras, & Guay, 2011;Viterbori, Usai, Traverso, & De Franchis, 2015) have been shown to predict school success. When a child is considered to be gifted in an educational context, this is often based on the results of an assessment procedure, including conventional, static testing of intelligence, or school aptitude. These tests, however, have been shown not to be advantageous for all children, and do not unveil information about psychological processes involved in learning (e.g., Grigorenko, 2009). As conventional tests, for a large part, rely on past learning experiences (Elliott, Grigorenko, & Resing, 2010), children who have had less than favorable learning experiences, have been documented to underperform on these tests (Robinson-Zañartu & Carlson, 2013). Dynamic tests, in contrast, are much more focused on a child's potential for learning (Sternberg & Grigorenko, 2002). As in these tests, feedback and/or instruction are integrated into the testing procedure (Elliott, 2003), they allow for examining to what extent children show improvement in performance after an intervention, and whether other cognitive factors, such as executive functions, play a role in learning. In the current study, dynamic testing principles were applied to investigate to what extent two aspects of executive functioning, cognitive flexibility and metacognition, would be related to static or dynamic progression in analogy problem solving of gifted and average-ability children.

Dynamic testing
Rather than measuring the knowledge or skills a child has already mastered, dynamic testing focuses on what a child would achieve in a short time frame, and this assessment procedure is therefore expected to provide a more complete picture of a child's potential for learning (Elliott, 2003). The pretest training posttest design (Sternberg & Grigorenko, 2002) is a frequently used application of dynamic testing that allows for structured measuring of a child's learning progression. The graduated prompts technique (e.g., Campione & Brown, 1987) has been used successfully as a training intervention in combination with said design. In this training approach, children are provided with structured prompts each time they make a mistake in problem solving. In the current study, prompts were tailored to each individual problem to be solved, and became more specific gradually, ranging from metacognitive to cognitive prompts and modeling (Resing & Elliott, 2011).
Similar to static test scores, dynamic testing outcomes have shown that there are many individual differences between children; both in terms of the instruction they require to show learning progression, as well as in terms of the level of progression they show after training (e.g., Resing, 2013). Dynamic testing of children who have strong cognitive capacities, nevertheless, seems an area researched less intensively. In earlier studies, dynamic tests for this group of learners have predominantly been used as a means to identify giftedness in disadvantaged populations (e.g., Kirschenbaum, 1998), such as those who are economically disadvantaged (e.g., Borland & Wright, 1994). Previous research further indicates that gifted children not only have a cognitive advantage, but, more specifically, learn new skills faster, and are better at generalizing newly acquired knowledge (Calero, García-Martín, & Robles, 2011). The potential role of executive functioning in dynamic testing of this group of children has, however, not yet been examined abundantly.
Inductive reasoning is believed to play a central role in intelligence (Klauer & Phye, 2008), and is said to be of crucial importance with regard to acquiring and applying knowledge (Goswami, 2012) and solving problems (Richland & Burchinal, 2012).

Executive functioning
The graduated prompts technique employed in the current study included prompts activating different aspects of executive functioning, for example, in relation to self-regulation and monitoring of the problem-solving process. Executive functions comprise a number of complex cognitive processes enabling conscious control of thought and action (Monette et al., 2011) that are critical to purposeful, goal-directed behavior (Arffa, 2007). They are seen as the cognitive component of self-regulation (Calkins & Marcovitch, 2010). Research suggests that executive functions include inhibition, working memory and cognitive flexibility, which are key components of higher-order executive functions, such as metacognition (Miyake et al., 2000). The latter is usually divided into two dimensions: knowledge and regulation of cognitive activity (Schneider, 2010). To apply metacognition, assumed to play a role in developing new expertise (e.g., Sternberg, 1998), cognitive flexibility, working memory, and sufficient inhibition are prerequisites (Roebers, Cimeli, Röthlisberger, & Neuenschwander, 2012).
In addition, it has been argued that flexibility in applying newly learned skills and knowledge can be seen as an important aspect of cognitive functioning (e.g., Resing, 2013). Cognitive flexibility is said to include the ability to change perspectives spatially, or interpersonally, and being sufficiently flexible to adjust thinking to changing demands. Further, it is seen as a key component of the ability to think outside the box, and shares many characteristics with creativity, task, and set switching (Diamond, 2013).
Executive functioning has been found to be related to cognition (e.g., Ardila, Pineda & Rosselli, 2000). Studies investigating the relationship of executive functioning in a dynamic testing context, in particular with gifted children, however, are few, with most studies focusing on the role of working memory (e.g., Resing, Xenidou-Dervou, Steijn, & Elliott, 2012;Swanson, 2011).

The current study
The current study utilized a dynamic test for analogical problem solving, a subtype of inductive reasoning, employing graduated prompts techniques. Our main research aim was to provide more insight into the potential benefits of dynamic testing of gifted children. More specifically, we focused on the roles that ability, cognitive flexibility, and metacognition play in repeatedly measured static versus dynamic progression in solving analogies.
Our first cluster of research questions addressed children's progression in solving analogies from pretest to posttest. Based on previous research into progression of unprompted solving of analogy problems among young children (e.g., Tunteler, Pronk, & Resing, 2008), we expected a significant main effect of time. We hypothesized (1a) that both unguided practice and dynamic testing would lead to progression in solving analogies from session to session. More importantly, we expected a significant interaction of time × condition, hypothesizing (1b) that children in the dynamic testing condition would show more progression from pretest, before training, to posttest, after training (e.g., Resing & Elliott, 2011;Stevenson et al., 2013). We further expected a significant interaction between time and ability. Gifted children were reported to have a more extensive zone of proximal development (e.g., Calero et al., 2011), therefore we hypothesized (1c) that gifted children would show more progression after unguided practice experiences than their average-ability peers. We also expected a significant interaction of time × condition × ability, indicating that gifted children would show more progression after training than their average-ability peers (1d).
Our second cluster of research questions concerned the association between executive functioning and children's progression from pretest to posttest. We expected a significant interaction between time and cognitive flexibility. Considering that flexibility in applying skills and knowledge is suggested to be important for learning and applying new knowledge (e.g., Resing, 2013), we hypothesized (2a) that children with higher levels of cognitive flexibility would show more progression in solving analogies than their peers with lower levels of cognitive flexibility. We also expected an interaction between time, condition, and cognitive flexibility, (2b) hypothesizing that children with higher levels of cognitive flexibility would benefit more from dynamic training than those with lower levels. Furthermore, a significant interaction between time, condition, ability, and cognitive flexibility was expected. Building on empirical studies in which high-ability children were found to have an advantage in executive functioning (e.g., Arffa, 2007), we hypothesized (2c) that the progression paths of gifted children with higher levels of cognitive flexibility would be steeper than those of their average-ability peers with similar levels of cognitive flexibility.
Moreover, as self-regulating, metacognitive skills were found to play a significant role in learning (e.g., Campione, Brown, & Ferrara, 1982;Sternberg, 1998), we expected an interaction between time and metacognition, hypothesizing (3a) that children with higher levels of metacognition would show more progression in solving analogies than their peers with lower levels of metacognition. We also expected a significant interaction between time, metacognition, and condition, and hypothesized (3b) that children with higher levels of metacognition would benefit more from training than their age mates with lower levels of metacognition. Finally, a significant interaction was expected between time, condition, ability, and metacognition. Taking into account that high-ability children were found to have an advantage in TA B L E 1 Overview of the hypotheses (SA = solving analogies)

Hypothesis 1a
Unguided practice and dynamic testing will lead to progression in SA over time 1b Dynamic testing will lead to more progression from pre-to posttest 1c Gifted children will show more progression after unguided practice 1d Gifted children will show more progression after training 2a Higher levels of cognitive flexibility will lead to more progression in SA 2b Higher levels of cognitive flexibility will lead to more progression after dynamic training 2c Progression paths of gifted children with higher levels of cognitive flexibility will be steeper 3a Higher levels of metacognition will lead to more progression in SA 3b Children with higher levels of metacognition will benefit more from training 3c Progression paths after training of the gifted children with higher levels of metacognition will be steeper 4a Gifted children will need less metacognitive prompts 4b Gifted children will need less cognitive prompts self-regulation (e.g., Calero, García-Martín, Jiménez, Kazén, & Araque, 2007), we hypothesized (3c) that the progression paths after training of the gifted children who have higher levels of metacognition would be steeper than their average-ability peers with similar levels of metacognition.
Our last research question focused on individual differences in instructional needs, as measured by the number and the type of prompts required during training. As high-ability children were found to be more responsive to feedback (Kanevsky & Geake, 2004), and to have an advantage in self-regulation (e.g., Calero et al., 2007), we expected that gifted children's instructional needs during dynamic training would be significantly different from their average-ability peers.
We hypothesized that gifted children would (4a) need both less metacognitive and (4b) less cognitive prompts than their average-ability peers. Table 1 provides an overview of the hypotheses.

Participants
In this study, 113 children, 54 boys and 59 girls, participated, ranging in age from 7.1 to 8.9 years (M = 7.90). The average-ability children (n = 68) attended mainstream elementary schools, and those who were identified as gifted were enrolled in special settings for gifted and talented children in the Netherlands. Gifted children (n = 45) were oversampled and preliminary identification of giftedness took place on the basis of their enrolment in gifted education and qualitative judgments of parents and teachers regarding their giftedness. 1 Schools participated on a voluntary basis, and written permission to participate was obtained from the children's parents and schools prior to participation. Six children dropped out, as they did not participate in each test session.

Design
The study utilized a 2 × 2 pretest-posttest control group design with randomized blocks with Ability category (gifted vs. average ability) and Condition (dynamic testing vs. unguided practice) as variables (see Table 2). Blocking was based on the scores on the Raven Standard Progressive Matrices test (Raven, 1981), administered before the pretest. All the children who had been identified as gifted had obtained Raven scores of at least the 90th percentile. Children in the dynamic testing subgroups received training between pretest 2 and posttest, whereas children in the unguided practice subgroups received an unrelated dot-to-dot control task of equal length between pretest 2 and posttest.

Raven
Participants were administered the Raven Standard Progressive Matrices Test (Raven, 1981)  consists of five sets of twelve items each, with a total of 60 items. In this study, only the raw scores were used in the analyses.

Berg Card Sorting Test-64 (BCST-64)
The BCST-64 (Piper et al., 2011), the shortened version of the BCST, containing 64 trials, was used to measure cognitive flexibility. The BCST is an open-source computerized version of the Wisconsin Card Sorting Test (WCST; Grant & Berg, 1948). The unstandardized number of perseverative errors made during the administration of the BCST-64 were used as a measure of the participants' cognitive flexibility. Higher perseverative errors correspond with lower cognitive flexibility.

BRIEF
The teacher questionnaire of the

Dynamic version of geometric analogies 2.4.1 Pretests and posttest
The dynamic test used in this study was composed of geometric visuospatial analogies of the type A:B::C:D (see The pretests and posttest, parallel sessions with different, but equivalent analogy items, were composed of 20 trials.
The test sessions were equivalent in terms of the numbers of different elements, and transformations used for each analogy item, as well as the order in which the items were presented in relation to their difficulty level. The children received minimal instructions only in the two pretests and the posttest, as they were told that they had to solve puzzles with different shapes. The test leader then asked the child which shapes had to be drawn in the fourth box to solve the puzzle.

Training
The current study employed one training session, consisting of 10 geometric analogies that were not used in either the pretests or the posttest. The training session was based on graduated prompts techniques (Campione & Brown, 1987;Resing & Elliott, 2011), and consisted of five steps per item. The prompts were administered following a standardized protocol, and were provided hierarchically, from two very general metacognitive prompts to two concrete cognitive prompts tailored to each specific item (see Appendix Table A1) . Prompts were given if a child could not solve the analogy independently. After each prompt, children were asked to draw the solution of the analogy, and check their answer. If, after the fourth prompt, a child had not solved the analogy correctly, the test leader modeled the correct answer for the child. After the four prompts had been provided, and/or the test leader had shown the correct answer, the children were asked to explain why they thought their answer was correct. Then, the test leader provided a correct self-explanation.

General procedure
The children were tested once a week over a period of five consecutive weeks. All tests and questionnaires part of the present study were administered following standard, protocolled instruction. At the beginning of the pretests, training session, and posttest, the children were provided with the six geometrical shapes used in the analogies, and in cooperation with the test leader named each shape, after which the test leader asked the child to draw the shapes below the printed shapes, staying as close to the original as possible.

Scoring
Analogy items were scored on the basis of children's drawings, in combination with their verbal explanations. Some of the children experienced difficulties drawing the geometrical shapes. As each child had to copy the shapes used in the analogies on the cover sheet, in the vast majority of cases the test leader knew which shapes the child was drawing. If necessary, the child would be asked to point out on the cover sheet which shapes were intended.
For each item, the number of transformations that the child had applied correctly in solving the analogy was scored.
Each analogy item was constructed by means of 1, 2, 3, 4, or 6 transformations that the child had to apply correctly to accurately solve the item, adding up to a total of 59 transformations per test session. The total number of transformations applied correctly in solving the analogies was taken as the outcome variable for each test session.
To estimate coding reliability, the pretest 1 data were scored by both the first author and a student assisting in data collection. An inter-rater reliability analysis showed that inter-rater agreement for the pretest 1 correct transformations was good ( = .83, p < .0001).

Analyses
Multilevel modeling was used to analyze the data. Multilevel modeling capitalizes on the hierarchical structure of the data, allowing us to study relations among variables at different levels and across levels.
We can simultaneously answer level 1 questions about within-person change, and level-2 questions about how these changes vary across children (Singer & Willett, 2003). In the current study, level 1 represented the repeated measurements of the number of correct transformations within children, and level 2 represented the variability between children. We followed a predetermined model building structure as proposed by Singer and Willett (2003); starting with two simple, unconditional models and including our time-variant and time-invariant predictors in the successive models. The predictors were: condition, ability category, cognitive flexibility, and metacognition. Two time-invariant predictors, metacognition and cognitive flexibility, were mean centered to improve interpretation (Singer & Willett, 2003).
R (R Development Core Team, 2014) was used to fit the models. The fit of all models was compared using the likelihood ratio test (LRT) and two fit indices: Akaike's Information Criterion (AIC) and the Schwarz's Bayesian Information Criterion (BIC). The LRT follows a 2 -distribution where the degrees of freedom are equal to the difference in the number of estimated parameters between the models. The LRT compares the "log likelihood" of two models and tests whether they differ significantly. The AIC and BIC are ad hoc criteria based on the log-likelihood statistic. The AIC and BIC statistics can be compared for all pairs of models, whether the models are nested within one another or not. These indices use a penalty function based on the number of parameters so that the more parsimonious model is favored. A lower AIC and BIC value indicates a better fit of the model (Singer & Willett, 2003). All the discussed models were fitted using the full maximum-likelihood (FML) estimation. Most of the models differed in their fixed parts, and therefore deviance based on FML was needed to be able to compare the successive models (Singer & Willett, 2003).

RESULTS
Before examining our research questions, one-way analyses of variance were conducted for each Ability category to evaluate possible differences between children in the conditions. The Raven scores, pretest 1 number of correct transformations, and age in months were used as dependent variables, and Condition (dynamic testing vs. unguided practice) as independent variable. The findings for the gifted and average-ability children, analyzed separately, revealed no significant differences in Raven scores (p = .53; p = .61), pretest 1 correct transformations (p = .40; p = .85), nor in age (p = .52; p = .98) between the dynamic testing and unguided practice conditions, respectively. We also examined possible differences between the gifted and average-ability children. The gifted children outperformed their peers on both the Raven scores, and the pretest 1 correct transformations (for both measures, p < .001), but no significant differences were found in age (p = .31). Descriptive statistics of all measures used in the current study, per condition and Ability category are provided in Table 3.
We conducted growth curve analyses (Multilevel analysis; MLA) to model growth in the number of correct transformations. Table 4 presents the parameters and fit indices of the models. We first fitted the unconditional means model (intercept-only model) to acquire the random effects that revealed a significant intercept effect (p < .001). We examined the intraclass correlation coefficient (ICC) as a measure of dependence; it describes the proportion of outcome variance that lies between persons in the population (i.e., the cluster structure of the data). As indicated by the ICC coefficient, of the total variation in the number of correct transformations, 54.38% could be attributable to differences between children. This finding revealed that the observations were not independent, and indicated that there was systematic variation in the outcome measure (transformations) worth exploring, both for the within-level and betweenlevel variance, reinforcing the choice of multilevel modeling. Note. Significance: **p < .001, *p < .05. The deviance, AIC, and BIC statistics were examined for the relative goodness of fit of the successive models.
In Model 2 (the unconditional growth model), we included our time predictor into the level-1 submodel to explain the remaining within-child variance ( In Model 4 we included Ability category, gifted versus average-ability, as a predictor for initial status. Model 4 provided a better fit to the data compared to Model 3 ( 2 (1) = 10.82, p = .001). Children's Ability category was found to be related to the number of correct transformations at pretest 1 as shown by a significant main effect of Ability category (8.23). Specifically, children with higher intellectual ability scored, on average, higher on pretest 1 than average-ability peers. Model 5 showed that Ability category was also a significant predictor for children's rate of change, as indicated by a significant interaction of Ability category and Time. Model fit improved ( 2 (1) = 4.96, p = .03). The estimate (−2.21) revealed that average-ability children improved more in the number of correct transformations over time than gifted children.
In Model 6 we examined whether the dynamic training session had different benefits for gifted and average-ability children. We included the interaction effect of Ability category and Condition, which did not improve model fit (  included the interaction effect of Metacognition and Condition, which led to an improvement in model fit ( 2 (1) = 4.40, p = .04). The estimate (.149) showed that children with higher scores on the Metacognition Index benefited more from training than peers with lower scores. We included the three-way interaction between Condition, Ability category, and Metacognition in Model 14. Results showed that the progression paths of gifted children that had higher levels of metacognition were not steeper than those of their average-ability peers ( 2 (1) = .20 p = .66).
In conclusion, Model 13 was shown to be the model that best fitted the data based on the LRT, and the AIC and BIC statistics. The dynamic sessions led to an improvement in the number of correct transformations the children used.
No differences in dynamic training benefits for gifted and average-ability children were found. The average-ability children in the unguided practice condition did, however, show more improvement across test sessions than the gifted children in the unguided practice session. Cognitive flexibility did not influence children's progression over time and the improvement in the number of transformations after receiving the dynamic training. The progression paths did also not differ for gifted children with higher levels of cognitive flexibility and their average-ability peers.
Metacognition did not influence progression in the number of correct transformations. Children with lower levels of metacognition, as indicated by higher scores on the Metacognition Index, showed more improvement in the number of correct transformations after the dynamic training than their peers with higher levels of metacognition. Lastly, the progression paths did not differ between gifted children who had higher levels of metacognition and their average-ability peers.
To examine our final research question regarding potential differences in the instructional needs of gifted and average-ability children, we conducted a one-way analysis of variance (ANOVA) with two within-subjects factors (metacognitive and cognitive prompts) and one between-subjects (Ability category) factor with the number of prompts = 2.27, p = .14, or cognitive prompts, F(1,51) = .17, p = .69 across ability categories (see Table 5). These results suggested that the two groups of children, gifted versus average-ability needed a similar number of steps during training, indicating their need for instruction was similar from both a quantitative, relating to the total number of prompts, and a qualitative, relating to differences in the type of prompts provided, perspective.

DISCUSSION
The current study explored the potential differential benefits of dynamic versus static testing of gifted and averageability children, and focused on two aspects of executive functioning, cognitive flexibility and metacognition. First of all, our results showed that children who had unguided practice experience only, and children who were dynamically tested showed progression in the number of correct analogical transformations. When children were tested dynamically, however, their progression paths were shown to be more advanced, which supports previous findings (e.g., Stevenson et al., 2013). In this sense, our findings build upon earlier studies in which it was posited that dynamic testing of children reveals a more complete picture of their cognitive potential than static testing only (e.g., Elliott, 2003).
Moreover, our findings indicated, as expected, that gifted children start at a higher ability point, and keep this advantage during following sessions. When looking into potential differences between gifted and average-ability children in relation to the nature of progression, in contrast to our expectations, it was found that, in general, the average-ability children showed more progression than their gifted peers. We cannot, however, discount that the gifted children in the current study might have experienced a ceiling effect in testing. If so, we would then have expected them to show a differential need for instructions, which could not be supported by our data. Moreover, no mention of a ceiling effect is made in previous research with participants of the same age (e.g., Tunteler et al., 2008). It must be mentioned, nevertheless, that it is not known whether any high-ability children participated in these studies. Therefore, this explanation requires further research.
Looking more closely into training benefits, it was revealed that the gifted and average-achieving children showed similar rather than different progression lines after training, whereas previous studies into dynamic testing of gifted children found that these groups of children differed significantly in their performance and progression (e.g., Calero et al., 2011;Kanevsky & Geake, 2004). In the light of the fact that all groups of children progressed after training, our findings, ultimately, seem to suggest that dynamic testing might be better suited to reveal the cognitive potential of all groups of children (Elliott et al., 2010), including those with above-average cognitive abilities.
We also examined the role that cognitive flexibility and metacognition play in progression in accuracy of analogical reasoning, and training benefits. It could not be established that cognitive flexibility plays a role. A number of reasons can be identified for the unexpected results regarding cognitive flexibility. First of all, research into executive functioning among children is challenging. One important reason is the type of instruments used to measure executive functioning. It has been noted that performance-based tasks, such as the BCST-64 used in the current study, rarely measure one executive function only (e.g., Miyake et al., 2000). By definition, executive functions regulate various cognitive processes, including for instance visuospatial processing. Performance-based tasks measure these other processes as well, making measuring just one executive function, in isolation, difficult (Viterbori et al., 2015). The developmental nature of executive functions in childhood should also be taken into consideration (e.g., Diamond, 2013). Moreover, it should be noted that the cognitive flexibility task used in the current study is a single measurement, static test, whereas learning potential measures are dynamic. Therefore, future studies could research this relationship further by utilizing a dynamic cognitive flexibility task, such as the dynamic Wisconsin Card Sorting Task (e.g., Boosman, Visser-Meily, Ownsworth, Winkens, & Van Heugten, 2014). These authors found that the dynamic executive functioning indices were significantly associated with cognitive functions, whereas the static indices were not.
It was, nonetheless, found that metacognition had an effect on the training benefits, but not on the progression from pretest to posttest. Children who, according to their teachers, had lower levels of metacognition, in contrast with our expectations, benefitted more from training than their peers with higher levels of metacognition. Furthermore, the findings provide a first indication that a graduated prompts training procedure can, to a certain extent, compensate for lower levels of metacognition. This notion is particularly relevant considering Sternberg's (1998) assertion that metacognition is an important ability in the development of expertise.
Although it seems plausible that the graduated prompts technique used in the current study also helps improve metacognition, this tentative hypothesis should be investigated using several measurements of metacognition. It must be noted that, although studies suggest that rating scales can be used successfully to obtain an approximation of children's executive functioning (Toplak, West, & Stanovich, 2013), using teacher ratings is a very indirect method of measuring metacognition. However, due to the young age of the participants, it was not possible to use other instruments to obtain metacognition measures. Self-report measures are not recommended for young children, as they rely heavily on verbal ability (Whitebread et al., 2009). Thinking aloud protocols, moreover, might not fully capture implicit cognitive processes, as young participants might not be conscious of their metacognitive processes while solving a task (Lai, 2011). In future research among older children, these instruments could be used to investigate the relationship between metacognition and dynamic testing measures. Future studies should also focus on development and implementation of instruments that directly measure or predict executive functioning among young children.
Finally, we looked more closely into children's instructional needs during dynamic training. Contrary to what we expected based on previous literature (e.g., Calero et al., 2007;Kanevsky & Geake, 2004), we found no differences in the instructional needs of the gifted versus average-ability groups of children: the two groups of children needed a similar number of cognitive and metacognitive prompts. These results ultimately suggest that, compared with their average-ability peers, gifted children did not differ in terms of the number of cognitive, metacognitive prompts, nor in the extent to which they needed modeling, and, thus, can have similar needs for instructions to progress in learning.
Individual differences between children's need for instructions, both within and across ability categories, were, however, found, as suggested by the standard deviations of both groups of children, which is in line with previous studies (e.g., Resing, 2013).
In addition to the limitations mentioned above, the current study encountered some other limitations. First of all, it is important to mention that we only used the Raven Standard Progressive Matrices as a measure of intellectual ability.
Although the Raven test is known as a robust measure of intellectual ability (e.g., Jensen, 1998), we did not include other factors deemed important for cognitive and intellectual functioning, such as task commitment or creativity (e.g., Renzulli & D'Souza, 2014). Moreover, we only investigated correct analogical transformations, while other factors have also been shown to be important in progression in analogical reasoning. Investigating strategy use, in particular, could lead to interesting findings considering the assumed relationship between strategy use and aspects of executive and intellectual functioning (e.g., Shore, 2000).
The results of the current study yield some important implications for educational professionals. It seems advisable to administer a dynamic rather than a static test when children's intellectual abilities are questioned, especially for children with lower levels of metacognition. In this light, investigating the interrelationship between executive functioning and dynamic testing seems worthwhile, especially for children with lower levels of intellectual functioning or learning disabilities. The benefits of dynamic testing for these special groups of children seem especially relevant within the framework of response to intervention (RTI; e.g., Grigorenko, 2009). Research suggests dynamic testing may be used successfully to identify or predict the responsiveness to intervention of these children (e.g., Fuchs, Compton, Fuchs, Opponents of dynamic testing often argue that testing dynamically is more labor intensive, and, thus, more expensive than testing statically. The dynamic test used in the present study, for example, in total, took approximately 60-90 minutes to administer, whereas for a static test with a single test session, 15-20 minutes would suffice. Nevertheless, our findings suggest that taking extra time to test these children, including those identified as gifted, more than once and administering a dynamic training session, helps them in unveiling their cognitive abilities, and, thus, is worth the extra investment. This notion becomes even more salient when taking into account that dynamic testing of children also provides insight into their instructional needs (e.g., Bosma & Resing, 2012). The results of the current study remind us that, when teaching high-ability children, these children do not, by definition, need less instruction or feedback than averageability children, to show progression in learning. Just like any other children, some of these children can also profit from extra feedback or help so they can unveil their true cognitive potential. Finally, and most importantly, the results of the present study indicate that children, even those who have already achieved excellent results, can show learning progression when they are provided with the right instructions.

ENDNOTE
1 In the Netherlands, intelligence testing is not standard practice in primary schools. For admittance to special talent or gifted educational programs, teachers and parents' nominations are often used. In the present study, these nominations were used, in combination with a percentile rank score of at least 90 to identify children as gifted.