UvA-DARE (Digital Academic Repository) Error detection through mouse movement in an online adaptive learning environment

While response time and accuracy indicate overall performance, their value in uncovering cognitive processes, underlying learning, is limited. A promising online measure, designed to track decision-making, is computer mouse tracking, where mouse attraction towards different locations may reflect the consideration of alternative response options. Using a speedy arithmetic multiple-choice game in an online adaptive learning environment, we examined whether mouse movements could reflect arithmetic difficulties when error rates are low. Results showed that mouse movements towards alternative responses in correctly answered questions mapped onto the frequency of errors made in this online learning system. This mapping was stronger for the younger children, as well as for easy arithmetic problems. On an individual level, users showed more mouse movement towards their previously made response errors than towards other alternative options. This opens the possibility of adapting feedback and instruction on an individual basis through mouse tracking.


| INTRODUCTION
The learning process of essential skills, such as arithmetic, has a unique trajectory in every child. To create an effective teaching-learning environment, tailoring the instruction and feedback to the needs of individual learners is necessary (Federico, 2000).
Emerging e-learning platforms and new technologies can accommodate these needs. But to individualize learning materials (Bray & Mcclaskey, 2010), we need to understand the underlying thinking processes that drive particular behavioural responses during learning, that is, we need to understand why a child may consistently have difficulties when faced with a particular type of problem. In the present study, mouse tracking was used to measure difficulties children might have during arithmetic problem-solving in an online learning environment.

| Individual differences in arithmetic difficulties
When solving problems, the students' error responses are thought to reflect their cognitive process and/or applied strategies (Ben-Zeev,-1998;Buwalda, Borst, van der Maas, & Taatgen, 2016;Savi, Deonovic, Bolsinova, Van Der Maas, & Maris, 2018). Rational errors are errors that are logically consistent and rule-based rather than being random (Ben-Zeev, 1998). Rational errors reflect the student consistently applying an incorrect procedure (Brown & Burton, 1978). For example, for arithmetic problems, a common mistake is 3 × 2 = 5, where the numbers are added instead of multiplied. Being able to diagnose incorrect understanding, sometimes called misconceptions, by analysing systematic difficulties can ultimately help in individualizing education to students' state of knowledge. But the downside of this approach is that it requires students to make a sufficient number of mistakes.
Traditionally, tests in primary and secondary education are constructed such that students have a high probability of answering correctly. To keep students motivated, adaptive learning systems also choose a high success rate and, as a consequence, few error responses are available . Having to rely on the limited number of incorrect responses registered per student makes it almost impossible to rapidly detect underlying systematic difficulties in adaptive learning systems. Studies have shown that during the retrieval of mental arithmetic, in addition to the correct association (e.g., 3 × 2 = 6), problems can produce false associations with incorrect answers (e.g., 3 × 2 = 5; Campbell, 1987;Domahs, Delazer, & Nuerk, 2006;Siegler, 1988). This means that even though a correct answer is eventually given, the student might still contemplate these incorrect answers, possibly associated with misconceptions. Correct and incorrect answers then compete during the decision-making process.

| Different ways to track arithmetic cognitive processes
Measures of neural activity (e.g., electroencephalography) or eye movements (eye tracking) have provided important insights into the dynamics of the learner's cognitive processes in mathematics (i.e., Artemenko et al., 2019;de Mooij, Kirkham, Raijmakers, van der Maas, & Dumontheil, 2020;Hinault & Lemaire, 2016;Huebner & LeFevre, 2018;Lai et al., 2013;Spüler et al., 2016). However, these measures are also laborious and expensive and, therefore, difficult to scale to large samples outside research laboratories, such as an online learning environment.
An emerging addition to these methods is mouse tracking, that is, the recording and tracking of computer mouse movements made by participants, with the aim of providing a continuous stream of information during the decision-making process (Dale, Kehoe, & Spivey, 2007;Freeman, 2018;Freeman, Dale, & Farmer, 2011;Hehman, Stolier, & Freeman, 2015;Song & Nakayama, 2009;Spivey & Dale, 2006;Stillman, Shen, & Ferguson, 2018). Mouse tracking, as a method, was first introduced by Spivey, Grosjean, and Knoblich (2005), who used it as a window into the internal cognitive process during language comprehension. Since the free-to-use software of Mousetracker was introduced by Freeman and Ambady (2010), it has become very popular in diverse domains of social science (for recent reviews, see Erb, 2018;Freeman, 2018;Stillman et al., 2018). This method has many practical advantages: it can be collected online (so reduces the need to bring participants into the lab), it is relatively inexpensive and has a much broader scope in terms of the number and variety of participants. These characteristics are especially useful when studying young children for whom online data are often difficult to obtain.
Hand or mouse trajectories during mental arithmetic are also used to study numerical processing (Dotan & Dehaene, 2013;Faulkenberry, Witte, & Hartmann, 2018;Fischer & Hartmann, 2014;Marghetis, Núñez, & Bergen, 2014;Santens, Goossens, & Verguts, 2011). These studies showed that motor action is not only the end product of perceptual and cognitive processes but also that it can reveal how multiple representations are competing with each other during the problemsolving process (Schulte-Mecklenbeck, Kuehberger, & Johnson, 2019;Spivey, 2007;Spivey & Dale, 2006). For example, Faulkenberry (2014); Faulkenberry, Cruise, Lavro, and Shaki (2016) examined mouse movements during a numerical comparison task, where the numerical distance between two digits needed to be judged, while ignoring the physical size of the digits. These studies showed that in incongruent trials-when these two variables differed (e.g., in a 2-8 trial, the font size of the 2 was bigger than the font size of the 8)-the mouse path was curving more towards the incorrect response than in the congruent trials (e.g., when the 2 was smaller than the 8). This greater attraction towards the incorrect response, due to size congruity interference, is thought to reflect the response competition. Mouse tracking allows an examination of the strength of the attraction towards alternative options, without the need for an error to be made. What cannot be claimed is that mouse tracking is measuring the same as eye tracking: gaze usually precedes the hand/mouse movement in the decision-making process and there-

| The current study
In the current study, we implemented mouse tracking outside the laboratory. Specifically, mouse movements of primary school children were tracked while doing arithmetic exercises in an online adaptive practice environment (Straatemeier, 2014). The aim of this study was twofold.
First, it serves as a validation that mouse movements can reflect the competition, at the cognitive level, between multiple answers during the resolution of arithmetic problems, in an ecologically valid setting.
Critically, we hypothesized that the extent of mouse movements towards non-selected incorrect options would relate positively to how frequently these errors are made in a speedy arithmetic multiplechoice game. Second, we examined whether we could detect individual systematic patterns of mouse attraction towards certain errors made in the past. The ultimate aim of this study is to develop a measure of attraction towards systematic difficulties that can be used to adapt feedback and instruction to every unique learning trajectory.
For this measure of attraction, we developed a mouse-tracking method to analyse mouse movement in more complex daily-life tasks.
Most of the previous mouse-tracking studies have used a strict tworesponse options design; existing measures of mouse-trajectory dynamics have been developed for this design, such as the maximum deviation (MD) away from the correct response (Hehman et al., 2015). de MOOIJ ET AL. Some studies have used designs with four-response choices (see for example Cloutier, Freeman, & Ambady, 2014;Koop & Johnson, 2013), but in analyses have only selected the trajectories where there was deviation towards one of the alternative options and not multiple. In the current study, we have designed a method with five alternative response options, where attraction towards multiple answer options is allowed. During the process of solving a mathematical problem, individual children can have different error-related associations, but importantly, a child can also have multiple error-related associations.
By presenting more than one alternative option, we can track the whole problem-solving process, where a variation of error-related associations is also possible.

| Participants
For this study, 90,000 children, aged between 5 and 13 years old (M = 10.2 y), were randomly selected from a pool of users playing actively in an online learning environment (N = 180,000 users). The participants of the online learning environment are mainly children from primary schools that have bought accounts for their students.
Participants had to have logged in to the environment in the last 3 months before data collection to increase the chance that these children would play the game during the 6 weeks of data collection. Not all students were selected to be tracked through mouse movements to limit the load on the database. In this online learning environment, students can decide for themselves when they want to play, which games they want to play, and how many problems of a particular game they want to play. After recording the game for a total of 6 weeks, 1,590 different users (M = 10.3 y; SD = 1.46; 46% female) had played the selected arithmetic problems in our task and their trajectories could be used for analyses. As can be seen in Table 1, users were predominantly 8 years of age and older, since the selected problems required basic knowledge of, and practice with, all mathematical operations.

| Materials and equipment
The responses and the mouse movement data were collected within an online adaptive learning environment for practicing mathematics called "Math Garden" (www.oefenweb.com), used by over 180,000 primary school children in the Netherlands. This rich source of information has served as the ideal basis for numerous substantive and methodological papers. These topics range from replicating effects  . In Math Garden, every student has his/her virtual garden, where each plant represents a game from a different domain, such as addition, multiplication or percentages. Student abilities and item difficulties are estimated-based on speed and accuracy-using the item response theory, where an Elo rating system adaptively matches students to items on-the-fly (for more detail, see Klinkenberg, Straatemeier, & Van Der Maas, 2011;Maris & van der Maas, 2012). Based on a student's current ability to estimate, the difficulty level is determined so that the student has a fixed probability of answering correctly to ensure students remain motivated Straatemeier, 2014). Children can choose between three difficulty levels (with expected probabilities correct of .6, .75 and .9).

| Arithmetic speed mix game
Speed mix, one of the 24 games available in Math Garden, was used for the current study. In each game session, 10 problems with a mix of four different operations (i.e., addition, subtraction, multiplication and division) are presented. In this game, students are asked to click one of the six answer options within 8 s. The remaining time is visualised as virtual coins counting down on the bottom right of the screen (see Figure 1).
The student is rewarded with the remaining coins on the screen after giving a correct response; an incorrect response leads to the remaining coins being subtracted from their total points. No coins are given or lost when failing to answer within the 8-second time limit. This way of scoring is known as the "High Speed High Stakes" rule, which has excellent psychometric qualities (Maris & van der Maas, 2012). This task was chosen for this study because (a) the students practice core arithmetic skills, basic tools essential for solving more complex maths problems; (b) it has a multiple-choice design instead of giving a response through a keypad so that the mouse trajectory towards the different response options can be investigated; (c) the students are under time pressure to answer (i.e., 8 s), which promotes movement of the mouse before reaching a decision (Kieslich & Henninger, 2017;Scherbaum & Kieslich, 2018).
The original speed mix game design was adapted for this mousetracking study. At the start of each problem, a start screen was added, with a blue button in the middle that needed to be clicked before the arithmetic problem and answer options are shown. This ensured that every mouse trajectory started in the same position and with equal distance from all the answer options ( Figure 1). The answer options T A B L E 1 Distribution of age and gender in the sample for a given arithmetic problem were the same for every participant but were randomly placed across the six location boxes every time the problem was presented to a participant. A trial ended with the participant either clicking on one of the answer options, with a click on the question mark ( Figure 1) to skip the trial (i.e., when the student does not know the answer) or when the time limit was exceeded.
The speed mix game contains over a thousand different problems.
To reduce the server storage load and simplify the analyses, we selected 36 questions to be tracked. Nine problems were selected per operand use (i.e., addition, subtraction, multiplication and division).
The problems were chosen based on the frequency of errors made for each problem and the problem difficulty, see Table 2. The problem difficulty was based on the average Elo rating (see Klinkenberg et al. 2011) averaged over a year of data collection within Math Garden before the start of this study. The frequency of errors was calculated ass the proportion of times a particular incorrect response was given to a problem over the course of the year before starting the study.

| Procedure
For 6 weeks, the participants were tracked when playing the speed mix game. Students played in their natural environment independently, either in-or outside school, but mostly within school hours. No teachers or parents were involved in this study. Schools and families with accounts are informed that Math Garden collects diverse categories of data, such as mouse-tracking data, using some for research analysis. Children (their parents or schools) can opt out of being part of the research done in the practice system and are, therefore, not included in this study. All data were anonymized before analysis. This procedure was approved by the university department's Ethics Review Board.
Math Garden can be played on different kinds of devices, including touchscreens and tablets. Sessions tracked in this study that were played using touchscreen devices were excluded from analyses. Since Math Garden is a child-directed setting, children could decide for themselves whether they would play the speed mix game and for how long they would play. This means that during our data collection, not every child has seen (or answered) all the possible problems; some children may have only seen one question, and some children may have performed the same problem multiple times.

| Response errors
To determine, in a stable and robust way, which errors are made most often, the responses administered for the last 2 years by all users in Math Garden were examined for the chosen 36 questions (see Table 2). From the N = 607,125 collected responses, 70% were answered correctly and less than 1% were not answered on time. In the remaining N = 128,000 the student chose one of the five alternative incorrect responses. The overall proportion of error occurrence per incorrect response was calculated first across all users in Math Garden, to compute associations to mouse attraction. For example, the incorrect answer, 512, was given 13.5% of the time for the problem, 5 + 508, that is, the most frequent error, see Table 2). For the   S1 for more details). The advantage of adding the dynamic method is that there is still detection of movement towards an attractor, regardless of where the mouse is located at that point. For every trajectory, the findings in these methods were combined to ensure that the competition between the choices can be analysed from the start of the trajectory to the end. This was done by adding the number of movements from the dynamic method that was not inside an answer box, with the mouse locations obtained from the static method. The mouse locations and movements that could not be associated with any of the incorrect answer options were excluded from the analyses.

| Response accuracy
Of all responses, 74% were answered correctly. This meant that 26% of the trials and their mouse trajectories could be examined. The students answered correctly on average in 4.1 s (SD = 1.6 s), where the maximum time to respond is 8 s.

| Mouse trajectories
Per trajectory, M = 115.9 mouse locations (SD = 107.6) were collected, that is, around 35 measurements per second. Of all mouse locations, 22% were associated with the correct answer option-14% was inside the box and 7% was directed towards the option; 47% could not be directly associated with any of the answer options and 31% were associated with one of the five incorrect answer options (16% inside and 15% towards the answer option box). Of this 31%, the proportion of mouse locations associated with each incorrect answer option was averaged across the participants for each arithmetic problem. An example of three different types of trajectories from different children for the same question can be found in Figure 1: Movement straight towards the error response (green line); movement towards alternative option (blue line); a clockwise movement (red line).

| Mouse attraction to incorrect responses on group level
Firstly, we investigated whether frequent response errors made in the past (i.e., in the past 2 years by all users of Math Garden) would also attract the greatest number of mouse movements in this study. Therefore, a Pearson correlation between the response error rate and the number of mouse movements was calculated for each arithmetic problem ( Figure 3a). These Pearson r correlations were transformed using a Fisher Z transformation (z = 0.5 × ln ((1 + r)/(1 − r))).    (Figure 4). This test showed that the average correlation was significantly higher than zero, Fisher Z transformed M = .42 ( r = .23), t(100) = 4.23, p < .001.
This means that users showed more mouse movement towards their previously made response errors than to other incorrect options.
Furthermore, some checks were made to ensure that this finding was robust. Firstly, the average correlation per user over all problems was weighted by how many errors the person had made in their history of playing in Math Garden before data collection, assuming that problems with more error responses are more likely to reflect a consistent conceptual or procedural difficulty (e.g., a tendency to confuse multiplication and addition; see Appendix 2.1 in Data S1). The second check was to weight the correlation by how many mouse trajectories were collected from the user for a particular problem (Appendix 2.1 in Data S1). When a user had not given an incorrect response to a particular problem within a year from when the mouse trajectory was registered, the data were excluded since the player had presumably mastered the problem and it presented no difficulty. The checks show stable correlations irrespective of how much data were collected and when the errors were made.

| DISCUSSION
The aim of this study was to investigate whether it is possible to detect difficulties children encounter when solving arithmetic problems, without relying on the errors themselves. This was done by tracking mouse movements, a method intended to measure the competition between responses during the process of problem-solving.
This study is, to our knowledge, the first to implement mouse tracking outside the lab in an online adaptive learning system. We analysed the mouse movements towards multiple attractors, which were the incorrect response options of multiple-choice arithmetic problems.
Our findings showed that, even when the final response was correct, the mouse trajectories revealed attraction towards the errors most frequently made by students playing in Math Garden. All ages and arithmetic problems showed this high attraction, but the mapping of the mouse was stronger for younger children and easy arithmetic problems. Furthermore, we found that individuals who had made certain errors in the past (past 2 years before data collection) still showed more mouse movement towards these errors than to other attractors.
Thus, implementing mouse tracking in an online learning environment allows us to study whether students have systematic reasoning problems in solving arithmetic problems. This will enable us to give targeted feedback and instruction on the learning process, after both an incorrect and correct response. To give an example: if the method detects that a student is consistently attracted to an answer corresponding to mistaking addition for multiplication, it would greatly benefit the learning process to present a reminder to carefully check the operand, and a series of problems where the operations are constantly mixed, to practice switching between operations.
The strength of investigating the use of mouse tracking in an online learning environment is that the sample size is both large and heterogeneous in terms of math ability, age and background. Secondly, the mouse movements can be tracked for all its users daily for thousands of different maths problems at the same time throughout their whole primary school trajectory. Thirdly, the students learn these skills in their natural environment, either at school or at home.
This limitless access to Math Garden for students in diverse circumstances ensures that our results are robust and would not only be reproducible in a lab-controlled experiment.
Other studies have used the large-scale Math Garden database to categorise and detect systematic errors (Savi et al., 2018;Straatemeier, 2014), but they had to rely on the incorrect responses a child picked when they made an error. Since online adaptive environments require a high success rate for motivational reasons, errors are rare; in this study (N = 6,443) an incorrect response was given for 20% of the problems administered. Our findings show that mouse movement can also reveal information about these errors in the other 80% of the responses, which would help with the diagnosis of systematic difficulties at an individual level.
There are some limitations to this study. First, collecting data in a naturalistic setting, such as Math Garden, can cause the data to be noisy. There is no way to control the circumstances within which the child is performing the task. For example, they can be distracted while playing the game or have a bad internet connection. Equipment is not standardized since the mouse used is different for every school and home. It is, therefore, necessary to collect a large dataset, such as in this study, to be very strict in terms of removing noisy mouse trajectories and unfinished game sessions. Second, many children practice their skills in online learning environments on tablets and touchscreens. Mouse tracking cannot be used for such devices. A third limitation is that mouse-tracking studies are bound to a specific design, and not applicable to all games and experimental tasks. Every trial needs to have the same starting point and multiple-choice options located at an equal distance from the starting point. Finally, mouse tracking is difficult to combine and/or compare with other process-tracing methods such as eye tracking in large online studies.
Some laboratory studies have compared these two modalities and found sufficient correlation between gaze and cursor positions, but there is also a substantial variation (Franco-Watkins & Johnson, 2011;Huang & White, 2012;Lohse & Johnson, 1996;Quétard et al., 2016).
For example, with prolonged cursor fixation, the attention and gaze might be somewhere else on or off the screen, which would not be picked up by the mouse metrics. Unfortunately, eye tracking is currently very complicated to implement in an online learning environment (partly due to privacy issues).
For further research, it would be interesting to investigate individual cases with both eye and mouse tracking in a laboratory setting to see how these methods interact and complement each other with regard to signalling the attractiveness of error options in math questions.