Revealing the hidden structure of physiological states during metacognitive monitoring in collaborative learning

Using hidden Markov models (HMM), the current study looked at how learners' metacognitive monitoring is related to their physiological reactivity in the context of collaborative learning. The participants ( N = 12, age 16 – 17 years, three females and nine males) in the study were high school students enrolled in an advanced physics course. The results show that during collaborative learning, the students engaged in monitoring in each self-regulated learning phase such as task understanding, planning and goal setting, task enactment, adaptation and reflection. The results of the HMM indicated that the learners' physiological reactivity was low when monitoring occurred. The associations between the states based on the HMM provide insights not only into how learners engage in metacognitive monitoring but also about their level of physiological reactivity in each state. In conclusion, exploring aspects of metacognitive monitoring in collaborative learning can be done with the help of physiological reactions.

The results show that during collaborative learning, the students engaged in monitoring in each self-regulated learning phase such as task understanding, planning and goal setting, task enactment, adaptation and reflection. The results of the HMM indicated that the learners' physiological reactivity was low when monitoring occurred. The associations between the states based on the HMM provide insights not only into how learners engage in metacognitive monitoring but also about their level of physiological reactivity in each state. In conclusion, exploring aspects of metacognitive monitoring in collaborative learning can be done with the help of physiological reactions.

K E Y W O R D S
collaborative learning, hidden Markov models, metacognitive monitoring, physiological reactivity Effective collaboration is more than working in groups or completing a task assignment Rochelle & Teasley, 1995). Rather, effective collaboration requires group members to ensure that they work towards the shared goals and explicate to each other when they become aware that their collaboration is not heading towards the shared goals (Johnson et al., 2007). This means that, during collaboration, learners need to negotiate shared goals to ensure they all work towards the same outcome (e.g., Järvelä et al., 2018), maintain positive socioemotional atmosphere to ensure fluent collaboration (e.g., Lajoie et al., 2015) and finally coordinate and ensure that each member is responsible for the joint outcome of the collaborative task (Rogat & Linnenbrink-Garcia, 2011). Metacognitive monitoring and control processes operate during task processing to assess and guide the learner(s) to the learning goal (Winne & Hadwin, 1998). During collaboration, learners need to actively monitor their cognition ('Am I understanding this?'), motivation and emotions ('Are my feelings or thoughts disturbing my learning progress?'), behaviour ('Do I have everything I need to perform this task?') and, finally, coordinate the collaboration ('Is my group progressing with this task?'). Active monitoring plays a major role in collaborative learning because it provides learners with possibilities to adjust and change the ways they collaborate when challenges are externalized.
Sometimes, these metacognitive monitoring activities are verbally externalized in social interaction, but not always, as monitoring is eventually an internal and mental activity (Nelson, 1996). Although metacognitive monitoring is an internal and mental activity, and hard to externalize, it might leave a 'physiological footprint' in the arousal level, which can be measured, for example, through peaks in the electrodermal activity (EDA) signal (Pijeira-Díaz et al., 2018). In general, arousal refers to the degree of physiological activation and responsiveness triggered by an event, object or situation during a person's interaction with the environment (De Lecea et al., 2012;Juvina et al., 2018). From the learning process perspective, the rapid occurrence of these EDA peaks has been observed to be beneficial because, under certain conditions (e.g., a room with temperature control), it accounts for cognitive or affective activation of learners (Doherty et al., 1995). In a similar way, engaging in metacognitive monitoring is beneficial for learning as it informs about the progress of the learning process. Therefore, investigating the invisible reactions of the body and the brain may indicate when learners are engaging in metacognitive monitoring . In this sense, not only is there a need to understand when learners engage in monitoring activities but also how it is reflected in their physiological activation.

| SELF-REGULATED LEARNING IN THE CONTEXT OF COLLABORATIVE LEARNING
In the context of collaborative learning, the way learners engage in self-regulated learning (SRL) is crucial. In a collaborative situation, the way that learners engage in SRL provides a means to investigate how learners' SRL evolves . Especially in collaborative learning, SRL literature increasingly considers the social context in which learners carry out regulation of learning (Volet et al., 2009). Collaborating learners have to negotiate and have a consensus regarding 'what' and 'how' they will learn and identify a need to regulate their interactions and learning (Hadwin et al., 2017). SRL is not static or a state but an adaptive, cyclical process which manifests through a series of contingencies over time (Cleary & Zimmerman, 2012). SRL involves active monitoring of cognition, motivation, emotions and behaviour, as well as adaptively responding to new challenges, situations or failure in ways that optimize progress towards personal goals (Winne & Hadwin, 1998). The mark of regulation is an intent or purposeful action in response to situations such as challenges in learning (Hadwin et al., 2017). For example, Malmberg et al. (2015) examined what types of challenges and SRL strategies learners reported in the context of collaborative learning. The study revealed that, mostly, the challenges were behavioural, focussing on time management or the environment. However, they also found out that the groups which were successful also identified cognitive and motivational challenges. That is, the successful groups externalized their metacognition which allowed them to outperform the groups which did not.
Models of SRL include phases that learners go through during their task execution as they proceed to reaching their learning goals. Depending on the model, the exact number of the phases varies from three to four (Winne & Hadwin, 1998;Zimmerman, 2001), and the definitions of activities involved within each phase vary from broad to very specific (Azevedo et al., 2011). According to Winne and Hadwin's (1998) model, SRL consists of four phases: task understanding, planning and goal setting, strategic enactment and evaluation.
Although in the different models of SRL, the phases are presented as occurring linearly, this does not mean that they occur in such an order during the learning process (Malmberg et al., 2017).

| THE ROLE OF METACOGNITIVE MONITORING IN SELF-REGULATED LEARNING
In each phase of SRL, learners may engage in metacognitive monitoring. As a result, learners can revisit and change their studying plans and/or strategies according to the extent to which incoming information aligns with existing knowledge structure and whether there are inconsistencies or discrepancies in the information stream (D'Mello et al., 2014). In this way, learners can optimize their learning process.
This means that metacognitive monitoring allows learners to shift between the four phases of SRL and adapt their learning according to learning goals . Winne and Hadwin (1998) propose that, within a task, these four phases of SRL occur in a weakly sequenced manner in a way that standards created for learning in the previous phases produce a negative feedback loop indicating a discrepancy between the current and the desired stage of learning. Metacognitive monitoring is assumed to operate by comparing the conditions, operations or products against corresponding elements in the subsequent phase of learning. If the standards that are set for learning do not match the products, learners are expected to enact control processes to reduce discrepancies (Greene & Azevedo, 2007). This is to say that metacognitive monitoring plays a key role in Winne and Hadwin's (1998) et al. (2015) found that group members reacting to peers' previously expressed metacognitive monitoring is a prerequisite for collectively sharing regulation. Similarly, Malmberg et al. (2017) found that metacognitive monitoring promotes knowledge construction and also provides means for shared planning processes to occur.
Despite metacognitive monitoring processes being essential for effective SRL, not all learners deploy monitoring processes accurately whilst learning (Winne & Jamieson-Noel, 2002). For this reason, Sonnenberg and Bannert (2015) investigated how applying metacognitive prompts affects learning performance and the appearance of phases of regulated learning. They found that metacognitive prompts assisted not only with the learning performance but that highperforming students showed more frequent changes between planning and goal setting and strategic enactment phases than lowperforming students.

| Methods to capture metacognitive monitoring and SRL
For decades, the way that learners express metacognitive monitoring has been measured via self-reports (Azevedo, 2015). Static selfreports reveal what learners believe about how they monitor their learning, but it is not only what the learners think they do, as engaging in metacognitive monitoring is highly depended on the task, the learning situation and the context . In addition, learners' self-reports might be biased because they measure what the learners believe they are doing, which research often shows to be inaccurate (Winne & Jamieson-Noel, 2002).
Since monitoring is a complex mental process, trace methods can provide insights into the learners' mental activity levels (Winne & Hadwin, 2013). Due to the limitations of self-report data, metacognition and especially monitoring events are often investigated by using observable traces that are viable for capturing monitoring events as they occur in a learning situation. These methods include, for example, think aloud protocols (Azevedo et al., 2011), video observations (Rogat & Linnebrink-Garcia, 2011), computer logs (Malmberg et al., 2014), eye tracking (Taub et al., 2017) or physiological data (Haataja et al., 2018;. However, due to the implicit nature of metacognitive monitoring, many of the monitoring events remain unseen. For example, think aloud protocols have been criticized for their intervention effect for cognition. When students have to elaborate their thinking, there is a chance that they also become more aware of it, which inferences with their thinking and learning process (Järvenoja et al., 2018;Winne, 2010). Video observations are highly timeconsuming to analyse, and they reflect, to some extent, the researchers' subjective interpretations of the learning situation . Contextualized computer logs and eye tracking data, instead, reflect the learners' visible activities, which might, to some degree, shed light on monitoring activities.
Physiological data such as electro-dermal activity and heart rate variability have nowadays gained attention in terms of using such data in SRL research. However, researchers have argued that it may remain meaningless unless it is contextualized in a learning situation . For example, Haataja et al. (2018) explored how learners' physiological signals match when they engage in metacognitive monitoring. They found out that, to some extent, when learners engaged in metacognitive monitoring, their physiological signals were also in synchrony. Whilst there is an extensive body of research illustrating how observable traces of metacognition can illuminate monitoring events (Azevedo et al., 2011), physiological data and its relation to monitoring have been scarcely explored.
There are four reasons why the use of physiological data for investigating monitoring in collaborative learning is worth of exploring. First, metacognitive monitoring is only partly visible (Winne & Hadwin, 1998) when cognitive activities are displayed. Second, when viewed from the theoretical framework of SRL, metacognitive monitoring is a result of actively searching for discrepancies between the desired outcome (motivational, emotional, behavioural or cognitive) and the actual outcome. However, despite metacognitive monitoring being conscious, it can also be effortless and almost automatic, especially in routine tasks (Butler & Winne, 1995). This means that in light of the theoretical framework of SRL, the major difference between automated and active metacognition is that sometimes the automated monitoring activities are conducted without conscious effort (Winne, 2017). Third, if monitoring is active, that is intentional, it would also be reflected in physiological reactions such as EDA, which have been shown to increase with attention in relation to engaging stimuli and attentiondemanding tasks (Poh et al., 2010). Fourth, despite a vast amount of theoretical and conceptual progress in SRL (Hadwin et al., 2017), there has been less progress in developing methods that could make the primarily invisible active mental monitoring processes visible and measurable during authentic classroom settings .

| Electrodermal activity and monitoring
In light of the rationale provided in the previous section, this paper explores the relationship between monitoring and EDA, as part of the quest to capture invisible metacognitive processes by means of physiological reactions. EDA refers to changes in the skin conductance as a result of the activity of sweat glands, which are under exclusive control of the sympathetic nervous system (Dawson et al., 2017). The sympathetic nervous system can be activated by cognitive, affective and physical processes (Poh et al., 2010), producing a reaction known as arousal (De Lecea et al., 2012). Cognitive and affective arousal have been of primary interest in psychophysiology, an interdisciplinary field which studies the connection between psychological processes and physiological reactions (Cacioppo & Tassinary, 1990). Increased emotional arousal or cognitive workload results in changes in the EDA signal (Henriques et al., 2013).
EDA comprises two different components: the skin conductance level and the skin conductance response (Boucsein, 2012). The skin conductance level is the tonic component, changes slowly and is graphically interpreted as a baseline (Dawson et al., 2017). The skin conductance responses (SCRs) are superimposed on the skin conductance level and graphically appear in the form of peaks, showing a steep incline to the peak and a slow decline to the baseline (Boucsein, 2012). The frequency of SCRs or EDA peaks and their amplitude are common features of EDA used, for example, in quantifying the level of arousal (Dawson et al., 2017). Such indicators might signal increased mental effort related to task difficulty, cognitive load or engagement (Azevedo, 2015;Hernández-García et al., 2015). Nikula (1991) posits that, in particular, SCRs can reflect 'negatively tuned cognitive activity.' SCRs have been studied, for example, in experiments to detect affective states including stressors (Knierim et al., 2018;McQuiggan et al., 2008), errors made during a task (Hajcak et al., 2003), cognitive load (Goyal & Fussell, 2017), attention (Critchley, 2002;Thorson et al., 2018) and the feeling of knowing (Morris et al., 2008). There is increasing evidence that shows an interrelation between cognition EDA. For example, studies have shown that there is a correlation between the EDA and task demands (Pecchinenda & Smith, 1996;Wilson, 2002). In addition, Pijeira-Díaz et al.'s (2018) study also revealed that the amplitude in EDA signals correlated with course grades. Therefore, it makes theoretical sense to expect that an EDA amplitude could be reflected in metacognitive monitoring; for example, when students' progress is not going according to their plan to meet their goal(s).
This study explores students' metacognitive monitoring in collaborative learning and how it is reflected in their EDA signals. The research questions are the following: 1. What types of monitoring events occur in the context of collaborative learning? 2. What is the distribution of the amplitude levels of the EDA peaks during collaborative learning?
3. What types of states can be identified based on the monitoring events and the associated EDA amplitude? 4. How do the monitoring events associated with the amplitude of EDA peaks occur in the context of collaborative learning? 3 | METHODOLOGY

| Participants and context
The participants (N = 12, age 16-17 years, 3 females and 9 males) of the study were high school students enrolled in an advanced physics course. The course was elective, and it required students to have completed two other physics courses. All participants were informed about the details of data collection and were told that participation would not affect their grade in any way and that they could revoke their consent at any time during the data collection. All 12 students gave written consent to participate in the study. The students collaborated in the same groups of three students throughout the course.
The collaborating groups were formulated based on the heterogeneity of learning regulation profiles for the sake of between-team comparability. Students were asked to fill in the cognitive and metacognitive strategies part of the Motivated Strategies for Learning Questionnaire (MSLQ) (Pintrich, Smith, Garcia, & McKeachie, 1993) as a measure of their self-regulation profile. Based on their questionnaire score, the students were categorized into three groups of self-regulation: low, middle and high. Each group included one student from each category.
The course consisted of 18 lessons, each lasting 75 mins, that the researchers designed together with the teacher. However, seven lessons (lessons 8-14) took place in the LeaF research infrastructure (https://www.oulu.fi/leaf-eng/), which is a classroom-like collaborative learning space that can accommodate up to 30 students. The infrastructure allows researchers to collect a variety of data (e.g., video, audio, physiological) without interfering with the learning process. This study focusses on lessons 8-14, which were observed whilst the students performing activities using the LeaF research infrastructure.
Each of the 18 lessons involved a short introduction to the topic by the teacher, followed by collaborative group work. Lessons 8-14 included tasks such as designing an experiment for measuring the speed of light and measuring the thickness of hair, as well as conducting hands-on experiments using lasers, mirrors, lenses, prisms and a double-slit to study reflection, refraction, dispersion and interference. A collaborative exam was administered to the students during the last lesson of the course.

| Data collection
Video data were recorded during each collaborative learning session conducted at the LeaF research infrastructure from each of the four groups, consisting altogether of 25 videos, which lasted 35 h in total. This is because there were three collaborative learning sessions from three different groups when video data were not recorded due to the absence of participants.
Physiological data were collected from each of the 12 students using Empatica E4 (Empatica Inc., Cambridge, MA, USA) bracelets.
Students were fitted with the bracelets at the beginning of each lesson and were informed that they could be taken off at any point.

| Qualitative content analysis
The analysis investigated how individual utterances related to the monitoring of behaviour, cognition, motivation and emotions occurred as the student groups progressed in collaborative tasks.
At the first stage of the analysis, all the individual student utterances focussed on monitoring the group's collaborative learning progress were identified using the videotaped learning sessions. At this point, monitoring was defined as the monitoring of one's own or one's group's cognition, behaviour, motivation or emotions (Winne & Hadwin, 1998). The individual who engaged in each monitoring utterance was identified. At this phase of the analysis, based on earlier studies (Azevedo & Witherspoon, 2009;Schunk, 1991;Wolters, 2011), it was decided to elaborate the three areas of monitoring in more detail. Thus, the coding was done at the individual student level, and each utterance related to monitoring cognition, motivation, emotion and motivation was coded.
During the second phase of the analysis, a single video was coded. The coding was negotiated in terms of (1) what monitoring is, (2) what monitoring is not and (3) the empirical examples of the data.
After the coding scheme was negotiated, agreed upon and fine-tuned, another round was conducted in which two researchers coded the same video again using the created coding scheme to ensure that the coding was clear, understandable and valid for use in the final coding.  (Fleiss, 1981).
During the third phase of the qualitative content analysis, the target of monitoring in terms of the phases of regulated learning was identified. That is, whether the target of monitoring was task understanding, goal setting and planning, task enactment, motivation and emotion or adapting and reflecting. This resulted in five categories representing the phases of regulated learning when monitoring took place (Table 2). To ensure reliability of the coding process, 20% of the identified monitoring episodes associated with phase coding were Phase 1: Task definition In this phase, the students form their understanding about the task and its affordances and constraints. The students may also redefine their understanding of the task again later when working on it. Learners can also search for additional information or ask for help if the instructions are unclear.
'So, did we have to do this, too?' 'What is this concept?' Phase 2: Setting goals and plans After the students have formed an understanding of the task, it is time to set goals and plan how to conduct it. What are the standards which tell that the goal has been met? Will the group use trial and error as an approach to problem solving or are they first going to search for knowledge?
'Should we do it the same way it was originally done?' 'Our aim is to pass this course'. 'How many lectures we have before the exam?'

Phase 3: Task enactment
In this phase, the students actively construct the task product and monitor how they are progressing when contrasted with standards. Students may also monitor other cognitive attributes such as how much effort they are putting into the task.
'Reading aloud the task contents.' 'Adjusting the equipment needed for the physics experiment.' Phase 4: Adaptation/reflection This phase is usually positioned at the end of the task. Strategies and ways of working will be evaluated and adjusted for the future. The goal is to make the work easier in the future.
'We did not manage to do all of these.' 'Did not we do the same type of experiment last week?' 'Today our group worked well!'  (2) to obtain a sample that allows to calculate reliability coefficient that can be generalized in each three phases of qualitative content analysis (Syed & Nelson, 2015).

| Electrodermal activity analysis
EDA data were analysed using MATLAB-based Ledalab (Benedek & Kaernbach, 2010) software, as recommended by the E4 bracelet manufacturer (Empatica, 2018). The peaks were detected by continuous decomposition analysis (Benedek & Kaernbach, 2010) in Ledalab, which uses an algorithm to account for the superimposition of peaks.
This is important as often a peak occurs during the decay time of a previous one (i.e., it is superimposed), which has historically caused biases in peak counts (see Boucsein, 2012). Following a long-term standard (Dawson et al., 2017), the threshold for a deflection to be considered a peak was set to 0.05 μS. The continuous decomposition analysis method assumes the input data to be raw data (Benedek & Kaernbach, 2010). Accordingly, no filtering or any other kind of preprocessing was applied. The raw data were used as a direct input for Ledalab, which then returned the onset and amplitude of the detected peaks.
The amplitude data were then combined with the coded monitoring event data for each student. Data tables were created for each group, containing sessions stacked together and columns representing the part of the lesson, timestamp, different types of coded monitoring events on a student-by-student basis, EDA and EDA amplitude. Since the EDA amplitude was calculated on a quarterly second basis to match the amplitude timestamps, each row in the other data columns was repeated four times.
After the EDA peaks (f = 24,774) were identified, they were discretized into quartiles with equal-width bins based on the amplitude of the peaks (see Figure 1). This made it possible to compare different students and groups, meaning the consistency of the analysis was ensured. Due to the paucity of amplitude data, the distribution of the data over the quartiles was heavily skewed. This enabled us to identify peaks with varying amplitudes such as high reactivity (mean peak value: 4.51, f = 4), high-mid reactivity (mean peak value: 3.06, f = 13), low-mid reactivity (mean peak value: 1.66, f = 258) and low reactivity (mean value: 0.19, f = 24,499). Additionally, a discrete level was added for those segments where no peaks occurred. After discretizing, the stacked data were split up and columns were added for each student, containing their respective data at the appropriate timestamps.
At the second stage of the analysis, segmentation and additional feature computation were performed. The group session data tables were saved as separate files in a directory and were iteratively read, segmented, features were computed, and a matrix was built based on the results. The matrix included student ID, group ID, segment ID, state (binary value representing the presence or absence of a phase event) and a phase ID representing the coded phase for each segment ('1' here represents no phase event).
The segmentation was conducted as follows: for each student, we iterated over each quarter second of their discretized EDA amplitude data. When no monitoring event was found, the end of the nonevent period was identified (i.e., the start of the next monitoring event) and features were computed on the entire non-event segment.
A segment ID was assigned along with the state (event or non-event).
When a monitoring event was found, the end of the monitoring event period was identified (i.e., when the event ID changed) and features were computed on the entire monitoring event segment plus a window of 5 s at either side of the monitoring event. Like before, a segment ID was assigned along with the state (event or non-event). The features computed were the percentage of time for a given segment that the discrete amplitude signal was in each quartile.

| Hidden Markov model analysis
Hidden Markov models (HMMs) were used to model changes in the monitoring events associated with EDA signals. The benefit of using HMMs in this setting is twofold: on the one hand, the HMM identifies a set of latent stages that are characterized by different EDA patterns.
On the other, the model also provides a set of transition probabilities between these states (Ghahramani & Jordan, 1997). Accordingly, by looking at the co-occurrence of, for instance, metacognitive monitoring events and these latent states, we are able to identify how different levels of EDA activity are associated with monitoring events.
Furthermore, the transition probabilities provide insight into how EDA changes in advance of and following such events.
F I G U R E 1 Histogram of EDA amplitude discretized into quartiles. Data where EDA peak reactivity is absent are excluded Before an HMM could be fitted to the data, the feature data were discretized. The data had five features: the percentage of time the EDA amplitude (in each segment) was in each of the four quartiles and the percentage of time EDA amplitude data were absent (in each segment). This was discretized into up to 10 levels on the basis of equal width bins.
HMM models with 2-10 states were compared, and AIC, BIC and log likelihood statistics were calculated (Table 3).
The six-state model was selected as it provided the optimal BIC fit, and, whilst not being the optimal model in terms of AIC and log likelihood, it was positioned at a local optimal for these two statistics.
Accordingly, a six-state model was fit to the data, a transition matrix was calculated and posterior estimates of state assignment were calculated for each segment. Segments were labelled with their most probable state assignment ( In total, all the five types of monitoring events associated with the five phases of regulated learning occurred 1391 times during collaborative learning (see Table 5). Most often, the students engaged in monitoring task enactment (f = 1008), followed by task understanding (f = 227).
Monitoring occurred the least during reflecting (f = 36) and planning and goal setting phase (f = 57).

| What types of EDA peaks can be detected in the context of collaborative learning?
As Figure 1 shows, the distribution of EDA amplitude in the presence of peaks was heavily skewed. The distribution was discretized into quartiles, and a fifth level was added for where EDA peak reactivity

| What types of states can be identified based on the HMM in terms of events and associated EDA amplitude?
Since our HMM model identified six different types of states that included either monitoring events or no events, the characteristics of the six different states are summarized below.
State 1: Segments were composed of 90-100% zero amplitude data, referring that no EDA peak was detected, with up to 10% of the segments being composed of the first quartile (low amplitude).
State 2: Segments were composed of 60-70% zero amplitude data, with up to 30-40% of the segments being composed of the first quartile (low amplitude). There was also a small chance (p ' 0.1) that up to 10% of the segments were composed of the second quartile (low-mid amplitude).
State 5: Segments were composed of 40-60% zero-amplitude data, with 40-60% of the segments being composed of the first quartile (low amplitude).
State 6: Segments were composed of 70-90% zero-amplitude data, with 10-30% of the segments being composed of the first quartile (low amplitude).
To summarize, states 3 and 4 were mostly composed of zeroamplitude data, whereas states 1, 2, 5 and 6 were composed of lowamplitude data from the first quartile. Figure 2 shows the proportion of event and non-event segments that were assigned to the respective states. For example, states 3 and 4 were predominantly characterized by zero-amplitude data and accounted for a large proportion of both event and non-event segments. State 1 was also similar, though there was a chance of up to 10% low EDA reactivity.
States 2 and 5 included low-amplitude EDA peaks. State 2 was composed of up to 30-40% low-amplitude EDA peaks, and State 5 was composed of 40-60% low-amplitude EDA peaks. For nonevent segments, the proportion of the segments in these two states was comparable. However, a far higher proportion of event segments was identified with State 5, indicating that events were associated with greater EDA peak reactivity data.
4.4 | How do the monitoring events associated with the amplitude of EDA peaks occur in the context of collaborative learning?
The most prominent transitions between states 1 and 6 are presented in Figure 3 to better depict how these states typically occurred and were associated with each other. The associations are marked with arrows, and states, which included EDA peak amplitude, are marked in blue.
State 1 was composed of 10% EDA reactivity, exhibited no monitoring events and was most likely to transition to State 3, which had no EDA reactivity but exhibited monitoring events. That is, low-peak amplitude with no monitoring events was likely to be followed by no EDA peak activity exhibited with monitoring events. In addition, State 4 was composed of predominantly zero-amplitude data, exhibiting monitoring events and no monitoring events, and was most likely to transition to itself. That is, low EDA peak reactivity associated with few monitoring events was likely to be followed by low EDA peak reactivity coupled with few monitoring events.
State 2 was partially composed of low-or, rarely, mid-amplitude EDA peaks and was equally likely to transition to State 5 as to State 6. These represent an increase and a decline in EDA reactivity, corresponding to an increase and a decline in events, respectively. In addition, State 5 was composed of the highest proportion of lowamplitude EDA peaks coupled with monitoring events and was most likely to transition to itself or to State 6, representing a continuation of EDA reactivity or a decline, respectively.
A statistically significant MANOVA effect was obtained, Pillai's trace = 0.74, F(1, 22) = 8.05, p = 3.2e-04. The multivariate effect was estimated at η 2 = 0.74, implying that 74% of the variance in the dependent variables (indicating the prevalence of each of our six HMM states) was accounted for by the differences in the independent variable (segment event/non-event assignment) which, according to Cohen (1988), is a large effect size. Our results were further confirmed with a significant result from a robust rank-based MANOVA (Wilks Λ rank = 0.13, p = 6.9e-07). This is to say, there was a significant difference between occasions when there was monitoring activity compared to occasions with no monitoring activity in the EDA amplitude.  Cohen (1988), are considered to be large effect sizes.
To better describe the findings, Figure 4 illustrates segments which were assigned to State 1.
F I G U R E 3 Associations between the six states identified after fitting hidden Markov models. The full matrix of transition probabilities is shown in Table 3, and the figure shows only those transitions that had probability higher than 0.  Azevedo andWitherspoon (2009), Schunk (1991) and Wolters (2011). Mostly, the students engaged in monitoring during their task enactment, whilst, during reflection and adaptation monitoring, occurred the least. This finding is aligned with previous studies indicating that monitoring during reflection does not occur often (Malmberg, Järvelä, & Järvenoja, 2017) and that monitoring during task execution occurs often (De Backer et al., 2015). However, when taking into account the nature of the learning tasks, which included performing different types of physics experiments, it is not surprising that students engaged mostly in metacognitive monitoring during task enactment.
This is due to the fact that, when performing these experiments, the students received immediate feedback on whether their task solution was correct or not, which often led to changing the task enactment strategy (Zheng & Yu, 2016 Pijeira-Díaz et al., 2018). However, even low EDA peak reactivity can signal the need for deliberate monitoring, such as the feeling of knowing (Morris et al., 2008) or errors made during a task (Hajcak et al., 2003), as it reflects a physiological response to external stimuli. know what to expect. Another reason might be that when learners enter to task, the problem statement is so novel that learners cannot verbalize the problem. Moreover, the HMM shows shifts between states that include monitoring events but no EDA reactivity, followed by EDA reactivity associated with monitoring events. With respect to the associations between the six states, the current study found that low EDA peak reactivity exhibiting no monitoring events was likely to be followed by zero EDA peak reactivity exhibiting few monitoring events and then transitioning back and forth between the states showing monitoring events associated with low EDA peak reactivity. According to Critchley (2002) and Dawson et al. (2017), physiological activation is expected to rise with attentiondemanding tasks, which were not shown in the current study; rather, the current study showed physiological de-activation which is related, for example, to boredom (Baker et al., 2010). However, in the light of theories of SRL (Winne & Hadwin, 1998), the results could indicate that monitoring events were rather effortless.
According to later transitions in the HMM, the learners showed more physiological activation, indicating that the learners gradually shifted their focus towards more intentional and focussed monitoring events. The HMM states that did not include physiological activation can also serve recovery purposes between intense states of metacognitive activation (Pijeira-Díaz et al., 2018), something that is needed as learners progress with the task. Previous studies have shown similar transitional patterns in learners' regulation of learning.
As learners become more familiar with the task, this also increases their possibilities to more automated and less deliberately monitor their learning (Winne, 2017).
The associations between the six states based on the HMM provide insights not only into how learners engage in metacognitive monitoring but also about their level of physiological activation in each state. The advantage of the HMM is that it can reveal the hidden structure of multiple data sources as well as their interconnection. It can be concluded that exploring certain aspects of SRL, such as metacognitive monitoring in collaborative learning, can be done using physiological reactions and EDA data (e.g., Haataja et al., 2018;Winne, 2019). However, more evidence is needed to confirm its' relevance in research on SRL.
There are several limitations in this study. First, there are few empirical studies investigating EDA peak reactivity in the context of collaborative learning. This, however, is also an advantage of the study, as recently in the field of SRL, there has been an increasing interest in using less traditional data channels in the research on SRL (e.g., Azevedo & Gaševi c, 2019). The second limitation is that the sample size was relatively small. Yet, despite the small sample size, data were collected from 18 collaborative learning sessions which were analysed in high details.
Researchers working in the SRL theory have increasingly pointed out, that adding data modalities enables triangulate the phenomena under investigation and verify the findings (Azevedo, 2015;Järvelä et al., 2020;. The potential of psychophysiological measures lies under assumption that human physiological reactions are not separate from human cognition and this connection is bi-directional (Critchley, Eccles, & Garfinkel, 2013). Moreover, these reactions have circumstances for human actions (Bandura, 1982). These psychophysiological reactions can provide a new opportunity to understand how monitoring occurs in relation for EDA. This is to say, combining EDA with monitoring can potentially be used to separate automated and conscious monitoring activities (Winne, 2017). In addition, this can inform exactly, when SRL should be exercised and provide further possibilities to explore why it is or is not exercised.
In practise, imagine a situation, where student notices (monitors) that things are not going as expected (an error). This leads to bodily reaction that can be detected from EDA, but it is not visible for anyone and does not lead to any observable actions. In order to promote SRL in this case scenario, integration of physiological measures with other data sources could provide a new dimension to learning analytics dashboards (see Sedrakyan et al., 2020), which could be used to support learner's regulation. This, however, would require making primarily invisible regulation processes and their accompanying social and contextual reactions visible, measurable and interpretable.
The findings presented in this paper demonstrate the potential of machine-learning techniques (HMMs in this case) for studying SRL when using novel physiological sources of data such as EDA.
Not only do techniques such as the HMM have the ability to reveal latent states in relatively noisy physiological data, but those latent states can (1) reveal transitions to other states, which is critical for the understanding of SRL that is characterized by the dynamic nature of learning processes and (2) be interpreted against the relevant SRL process when linked with other sources of data such as monitoring events obtained by coding collaborative utterances.
Future research should explore complementary data analytic techniques (Slater et al., 2017) such as epistemic network analysis (Shaffer et al., 2016) to offer additional insights into the links between the latent states of physiological data and the codes of SRL processes and to track how SRL unfolds over time and under different conditions.