Metric error monitoring as a component of metacognitive processing

Metacognitive processing constitutes one of the contemporary target domains in consciousness research. Error monitoring (the ability to correctly report one's own errors without feedback) is considered one of the functional outcomes of metacognitive processing. Error monitoring is traditionally investigated as part of categorical decisions where choice accuracy is a binary construct (choice is either correct or incorrect). However, recent studies revealed that this ability is characterized by metric features (i.e., direction and magnitude) in temporal, spatial, and numerical domains. Here, we discuss methodological approaches to investigating metric error monitoring in both humans and non‐human animals and review their findings. The potential neural substrates of metric error monitoring measures are also discussed. This new scope of metacognitive processing can help improve our current understanding of conscious processing from a new perspective. Thus, by summarizing and discussing the perspectives, findings, and common applications in the metric error monitoring literature, this paper aims to provide a guideline for future research.


| INTRODUCTION
Consciousness is typically referred to as the awareness of perceptual and mental states with various levels of processing experiences ranging from simple wakefulness to being aware of contents of cognition to meta-consciousness (i.e., consciousness of being aware; Frith, 2021).The contemporary approaches to the investigation of consciousness primarily focus on 'metacognition', which, in its simplest form, can be defined as 'cognition about cognition' (Flavell, 1979;Metcalfe, 2000).Thus, as a critical aspect of Abbreviations: 2AFC, two-alternative forced choice; aPFC, anterior prefrontal cortex; AROC2, Type 2 area under the receiver operating characteristic curve; ASD, autism spectrum disorder; dLPFC, dorsolateral prefrontal cortex; DMTP, delayed matching-to-position; DMTS, delayed matching-to-sample; EEG, electroencephalogram; fMRI, functional magnetic resonance imaging; FOJ, first-order judgement; FoK, feeling of knowing; IPS, intraparietal sulcus; JoL, judgement of learning; OCD, obsessive-compulsive disorder; PFC, prefrontal cortex; PSE, point of subjective equality; SOJ, second-order judgement; TED, temporal error detection; TMC, temporal metacognition; TMS, transcranial magnetic stimulation; TopDDM, time-adaptive opponent drift-diffusion model.consciousness, metacognition refers to the awareness of one's own cognitive state, and it is considered one of the highest levels of human cognition (Fleming, 2020;Fleming, Dolan, & Frith, 2012;Frith & Dolan, 1996;Lau, 2019;Lau & Rosenthal, 2011).From a functional perspective, metacognition is assumed to guide many behavioural processes such as learning (e.g., Wang et al., 1990; see also Veenman et al., 2006) and behavioural adaptation (e.g., Questienne et al., 2018).Metacognition is further divided into different components such as error monitoring, uncertainty monitoring, the feeling of knowing (FoK) and judgement of learning (JoL).Until very recently, the metacognition literature has used categorical measures (e.g., judgement of being correct or incorrect in two-alternative forced-choice [2AFC] tasks) to capture this ability.However, a more recent line of research has demonstrated that metacognition also captures the metric properties of experiences or performance (e.g., the magnitude of errors).The current paper aims to provide an overview of such contemporary approaches and the resultant discoveries referred to as 'metric error monitoring' or 'metric awareness'.These subjective experiences (see Frith, 2021) relate to instances where one judges how early or late they are to a meeting without consulting a time-keeping device.
Error monitoring is arguably the most popular way of investigating metacognitive abilities.Error monitoring is defined as the ability to correct reporting of one's own errors without relying on external feedback (Yeung & Summerfield, 2012).Thus, it requires conscious processing of errors and the representations of intended actions and is operationalized as the match between a perceptual classification accuracy and the subjective confidence judgement regarding the accuracy of the corresponding perceptual response (e.g., Fleming & Dolan, 2012;Fleming, Dolan, & Frith, 2012;Norman & Price, 2015).This can be most readily exemplified within the framework of the 2AFC paradigm.In 2AFC tasks, participants classify the stimulus as 'target' (or 'signal') or 'nontarget' (or 'noise').This set of choices constitutes the 'first-order classification'.When studying error monitoring, participants rate their confidence level regarding the accuracy of their first-order classification (i.e., second-order judgement [SOJ]).Thus, the match between this first-order classification performance and the SOJ constitutes the metacognitive accuracy (Fleming, Dolan, & Frith, 2012).Many studies have shown better than chance level match between first-and second-order performance of humans and non-human animals (e.g., Foote & Crystal, 2007, but see below for a detailed discussion) in memory judgements (e.g., Vandenbroucke et al., 2014) and choice behaviour (e.g., Fleming & Dolan, 2012;Fleming, Huijgen, & Dolan, 2012; also see Fleming, Dolan, & Frith, 2012).However, this approach restricts the characterization of task performance to a binary measure of accuracy, which in turn limits the ecological validity of the obtained results because categorical errors lack size/magnitude and direction.

| METRIC ERROR MONITORING: A NEW APPROACH TO CONSCIOUS MONITORING
Unlike the categorical approaches to error monitoring, many estimation errors in daily life contain metric information such as direction and magnitude (e.g., overestimating or underestimating the width of a parking slot to different degrees).Only recently, researchers have begun examining whether perceptual error monitoring ability extends to the metric domains by capturing the direction and magnitude of errors.These studies have robustly demonstrated that humans can keep track of the error magnitude (i.e., too much/many or a little/few) and directions (i.e., early/late or short/ long or more/less) in their judgements of time (e.g., Akdo gan & Balcı, 2017), line length (Duyan & Balcı, 2019) and numerosity (Duyan & Balcı, 2018; see also Yallak & Balcı, 2021).In these tasks, participants reproduced/estimated a given quantity and reported their subjective confidence regarding the proximity of their performance to the actual target (i.e., low, middle, or high).Critically, participants also reported their error direction as being shorter or longer (or fewer or many) than the target magnitude.
By applying this novel approach to interval timing the pioneering work by Akdo gan and Balcı (2017) demonstrated that human participants can better than chance capture their error magnitude and error direction in a temporal reproduction task (Figure 1a).These findings support the idea that human participants' metacognitive abilities previously observed in 2AFC paradigms generalize to temporal error monitoring.These findings were replicated by Kononowicz and van Wassenhove (2019), where participants were tested in a temporal production task and rated their error magnitude (and direction) on a slider scale (see also Öztel et al., 2021;Öztel & Balcı, 2023a).
In order to test the generalizability of these novel findings to other magnitude domains, Duyan and Balcı (2018) tested whether human participants are aware of their errors also in non-verbal counting or enumeration tasks.To this end, they presented participants with an auditory sequence consisting of a certain number of beeps and asked them to terminate the sequence when they thought that they heard the exact amount of previously experienced target sequence of beeps.Participants then reported their confidence and the direction of error (under or overreproduction of the target sequence), which tracked the magnitude and direction of their numerical errors.However, in this study, participants might have responded based on the cumulative duration of the auditory inputs rather than their numerosity.The subsequent study by the same researchers controlled for this confound by presenting a target dot array to the participants, and they asked them to report how many dots there were in the dot array, which was again followed by confidence and directionality judgements (Duyan & Balcı, 2019).Their results replicated the findings of the previous study conducted with a sequence of stimuli.
Motivated by the primary postulate of the A Theory of Magnitude (Walsh, 2003) regarding the shared processing of different magnitudes, Duyan and Balcı (2020) further tested metric error monitoring in the spatial domain.They employed a line length reproduction task, wherein participants were asked to adjust the length of the line to match a previously observed target line length.
F I G U R E 1 Illustration of task designs and hypotheses driven from the different measurement approaches in metric error monitoring.Panel (a) depicts the metric error monitoring tasks in three different domains.Trials in all tasks typically start with the presentation of the target quantity, which is followed by reproduction (time and space) or estimation (number) of the target.In the time domain, participants reproduce a target duration with two button press responses where the duration elapsed between these responses constitutes the reproduced time.In the spatial task, participants can adjust the length of a given line by pressing relevant buttons on the keyboard.In the numerical task, participants type their estimation of the number of dots in a target dot array.All reproductions/estimations are followed by confidence and direction judgements (separate judgement for each reproduction/estimation).Panel (a) depicts different types of approaches to measure these self-evaluations (Panel b1, interval responses or Panel b2, slider scale).Panel (c) depicts the hypothesized relationship between the reproductions (in standard deviation units) and dependent variables driven from the different approaches (subjective confidence, direction, and composite score plotted as a function of absolute z-scored reproductions/estimations [isolated measures] or signed z-score reproductions/estimations [composite score], respectively.The table depicts how the composite score is calculated based on different levels of confidence and error direction judgements.
Remarkably, the outcomes of this study closely replicated the findings gathered based on time reproduction.
Further supporting A Theory of Magnitude (Walsh, 2003) based on metric awareness, the study by Yallak and Balcı (2021) revealed a correlation in metric error monitoring performance across time, numerosity, and spatial domains.Crucially, error monitoring in magnitude domains was not correlated with error monitoring in perceptual decision making, which suggests the phenomenological distinction between perceptual and metric error monitoring.Together, these results demonstrate the metric error monitoring in magnitude domains and categorical error monitoring in 2AFC perceptual decision making.Metric error monitoring ability is now a widely replicated finding (e.g., Öztel et al., 2021;Öztel & Balcı, 2023a; see also Yallak & Balcı, 2021), which renders it a robust phenomenon.
Taken together, the results outlined above point to the different aspects of metric error monitoring as a newly discovered aspect of metacognition.These explorations not only deepen our understanding of human consciousness but also highlight the profound connection between our capacity to detect and evaluate errors in our actions beyond categorical forms.

| METRIC ERROR MONITORING: DIFFERENT OPERATIONALIZATIONS
Various classical approaches exist while measuring the error monitoring ability as assessed in 2AFC tasks.One way to investigate the error monitoring ability is the gamma and phi correlations between the first-order performance and second-order confidence ratings (Fleming & Lau, 2014).These correlations demonstrate the degree of match between the confidence and performance in terms of the number of high and low confidence judgements for correct and incorrect classifications, respectively.While these approaches are useful with the binomially distributed categorical performance, it is subject to inflated coefficients caused by the biases in the confidence ratings (Fleming & Lau, 2014).
One way to overcome this bias is the signal detection theory approach (Green & Swets, 1966), where metacognitive accuracy is measured by the difference between Type 2 hit rate and false alarm rate (Fleming & Lau, 2014).Type 2 hit rate is defined as the high confidence rated for the accurate responses and Type 2 false alarm rate is defined as the high confidence for the inaccurate responses.Another approach is Type 2 area under the receiver operating characteristic curve (AROC2) analysis, in which the area under the curve that depicts the Type 2 hit rate as a function of the Type 2 false alarm rate is calculated to index metacognitive accuracy (for a detailed discussion, see Fleming & Lau, 2014).
The above-mentioned approaches require the errors to be categorized in absolute correct and incorrect forms and thus do not capture the informational richness of the trial-by-trial metric errors (i.e., error magnitude and error directionality).One method to investigate the trialby-trial metric error monitoring is to combine the confidence (as an index of reported error magnitude) and direction judgements into a single variable.This composite score approach yields a six-leveled response variable (3 Â 2; for three confidence levels and two error direction response options) ranging from the lowest confidence 'short' ('few') responses to the lowest confidence and 'long' ('many') responses.Accordingly, the reported subjective confidence level is reverse-coded in a way that depicts the amount of deviation from the mean reproduction of the target.Thus, if the participant reports low confidence, it is recoded as '3' (i.e., 'subjective high deviation') for the estimated degree of deviation from the target.Finally, the sign of the deviation is depicted by the participant's error direction response.If the participant reports lower/shorter/fewer than the target quantity, the composite score is multiplied by À1.For instance, if the participant reported low confidence and judged to have reproduced/estimated longer than target, the composite score will be '3' (3 for 'high deviation' and '+' for 'above target').Similarly, if the participant reports high confidence and judged to have reproduced/ estimated shorter than the target, the composite score will be 'À1' (1 for 'low deviation' and 'À' for 'less than target').Thus, given a confidence rating ranging from '1' (low) to '3' (high), a composite score variable could range from À3 (low confidence, short/few) to 3 (low confidence, long) in an interval scale, where the absolute value of the composite score depicts the reported error magnitude by the participant and the sign depicts the error direction.Accordingly, values closer to zero depict little error magnitude and thus high confidence.
To capture the error monitoring ability in a single variable, the composite score is then regressed to the zscore transformed estimations to evaluate individuals' first-order performance according to their own means estimations (controlling for biases or drifts).Metric error monitoring performance is reflected in the positive linear relationship between the composite score variable and the z-score reproductions (see Akdo gan & Balcı, 2017).Note that outlier reproductions/estimations (i.e., three standard deviations away from the mean) should be eliminated prior to the data analysis.This precaution is necessary to prevent artificial support for the metric error monitoring hypothesis.For instance, if participants accidentally terminate their estimates, it would be relatively obvious to report low confidence and 'short/few' responses, which would artificially support the metric error monitoring hypothesis.Thus, such instances of responses resulting from mishaps should not be included in the analysis of metric error monitoring.Another significant concern related to this method is the omission of the value zero.This limitation can be addressed by combining these two variables in a single direction (i.e., using only positive or negative codes that range from 1 to 6, as in Doenyas et al., 2019;or À6 to À1) or analysing the data based on an ordinal scale.This being said, in our experience, different recoding and analysis approaches yield comparable results (Öztel & Balcı, 2023a).
Another important consideration is that using the composite score approach does not allow examining the isolated or peculiar features of the magnitude and direction monitoring.An alternative way for overcoming this limitation entails the isolated analysis of these two components of metric error monitoring.In this case, confidence judgements would exhibit an inverse relationship with the absolute values of the z-scored estimations, and a sigmoidal function should capture the relationship between the error direction judgements and z-score transformed estimations.
A more refined alternative approach to investigate metric error monitoring ability is to use a continuous slider scale where participants mark their reports on a real number line, where the middle of the line would correspond to the target magnitude.This method fine-tunes the error judgements (Öztel et al., 2021;Riemer et al., 2019) from which one can gather both error magnitude and direction judgements without treating them separately or combining them artificially.Here, while the direction judgement would carry both error direction and magnitude information, the confidence judgement would merely represent the subjective 'probability of being accurate' (e.g., Frith, 2021) of the reproductions.
The methods outlined above are fine-tuned ways of assessing trial-based metric error judgements, but they do not capture whether participants are aware of their overall metric biases particularly given that they utilize zscore transformed first-order performance.In other words, they do not capture whether participants are aware of the fact that they have an overall bias to underestimate or overestimate magnitudes.To this end, the metric error monitoring ability can also be investigated by asking participants to report their confidence and/or error direction about their average performances throughout the experiment.One disadvantage of this approach is that it relies on a single-shot measure, resulting in a loss of critical variance (akin to the difference between prospective and retrospective timing tasks).
Moreover, the results obtained from global operationalizations have so far yielded inconsistent results (Brocas et al., 2018;Öztel & Balcı, 2023a;Riemer et al., 2019).For instance, although Brocas et al. (2018) showed that participants can keep track of their overall timing biases, this was not the case in Öztel and Balcı (2023a).These inconsistencies might also result from the possible disruptive effects of working memory load increasing over the trials (for a discussion, see Riemer et al., 2019;Öztel & Balcı, 2023a) akin to the distinction between competence and performance (Chomsky, 1965).For example, two recent studies investigated whether participants could monitor their overall temporal bias using the single-shot variable mentioned earlier.In both experiments, participants engaged in a temporal reproduction task and reported their average error direction concerning the target duration after the test session.Riemer et al. (2019) showed that correct monitoring of overall temporal biases occurred only when trial-based feedback included error direction information.However, Öztel and Balcı (2023a) did not find the same effect of trial-based feedback on the global monitoring of temporal biases.These findings collectively suggest that monitoring overall temporal biases on a global scale might not be feasible without trialby-trial feedback that includes directional and magnitude information, as indicated by Riemer et al. (2019).One approach to address this limitation of the single-shot method involves assessing overall temporal biases after an unpredictable number of temporal reproductions throughout the test session.In line with this idea, Brown and Stubbs (1988) presented participants with a tape, which included four segments of different musical pieces.After completing the whole tape, participants were asked to estimate the duration of each single segment.By this, Brown and Stubbs (1988) could obtain multiple data points per subject from a single trial (see also Grondin & Plourde, 2007).This approach can be similarly adapted to evaluations of different sets of temporal errors and their retrospective evaluations.
Note that all of these measures (except for the use of slider scale) treat confidence as a proxy for error magnitude judgement.While this is a reasonable assumption at face value, an alternative approach could be the direct report of error magnitudes in isolation from subjectivity induced by the 'confidence' judgements (e.g., 'how much did you deviate from the target?') on a continuous slider scale where participants mark their absolute error magnitude.Such an operationalization would induce a more direct and fine-tuned calculation of error magnitude instead of crude mappings.One similar approach that could be utilized is to report the error magnitude and direction on the same measure (e.g., 'how much and towards which direction did you deviate from target?' on a slider scale).Notably, this measure would make direct access to idiosyncratic features of magnitude and direction judgement difficult (as these are seemingly distinct components of metric error monitoring, Öztel & Balcı, 2022).
All of the above-mentioned methods constitute a direct measure of the metric error monitoring ability.In order to test whether metric error monitoring performance is also observed based on indirect ways to assess it, Yallak and Balcı (2022) tested participants in an opt-out paradigm.Participants were asked to reproduce a target duration as correctly as possible.Upon each reproduction, points decreased exponentially with increasing magnitude of errors.Participants' task was to maximize the average performance simply by opting out of those trials in which they thought they made an error and only opting in those trials when they thought their performance matched the target.Results supported the prediction of the metric error monitoring hypothesis regarding a U-shaped relation between the directional error magnitudes and opt-out rates, where large errors (for both short and long reproductions) resulted in higher opt-out rates.This result suggests an implicit metric error monitoring ability (Yallak & Balcı, 2022) that does not require explicit judgements of error magnitude and direction.Critically, these researchers tested participants also in a task where the opt-out option was not always available.This variant of the task aimed to control better for the explicit/implicit biases in first-order judgements (FOJs) to improve SOJs and replicated the U-shaped relationship between opt-out rate and first-order performance.
Bader and Wiener (2021) also examined the temporal error monitoring ability indirectly in a temporal reproduction task by asking participants to re-perform the reproduction.Participants were assigned to feedback and no feedback groups.In the feedback group, participants received trial-by-feedback that did not include any directional information with respect to the target (i.e., the feedback was 'on/off the target' based on the reproduction).At the end of their reproductions (or feedback, depending on the experimental condition), both no feedback and feedback groups were given the opportunity to redo the reproduction.Results revealed that, as in the case of the feedback group, no feedback group also correctly adjusted their reproductions according to the target duration in the redo phase.Thus, this result reveals that participants did not need external feedback to improve their temporal reproductions.In line with Yallak and Balcı (2022), this result is another clear demonstration of indirect temporal error monitoring, which does not necessitate explicit verbalization of subjective confidence and error direction.
Each of the aforementioned operationalizations presents both merits and limitations.The selection of the most appropriate approach necessitates carefully considering theoretical underpinnings and the demands imposed by the experimental design.The inherent strengths and weaknesses of each operationalization offer researchers a nuanced empirical landscape upon which to base their methodological choices.

| MODEL-BASED APPROACH TO METRIC ERROR MONITORING ABILITY
The error monitoring ability in the perceptual domain is mostly explained based on noisy evidence accumulation model.The drift diffusion model (DDM) is a prominent example to these models.This model assumes that the subjective confidence rises as a result of evidence accumulation properties (i.e., drift rate), where a high quality of evidence accumulation (i.e., high signal-to-noise ratio, which increases the detectability of the target) predicts higher confidence (Yu et al., 2015).This approach attributes the emergence of confidence to the psychophysical characteristics of the stimulus while largely disregarding the potential contribution of the representational noise.Accordingly, this approach would predict similar confidence levels for similar stimuli regardless of the given context while overlooking the change-of-mind events (for a detailed discussion, see Pleskac & Busemeyer, 2010;Yeung & Summerfield, 2012).
What is the information processing basis of metric error monitoring?The only computational approach that aims to explain the metric error monitoring performance is proposed by Akdo gan and Balcı (2017).Akdo gan and Balcı (2017) proposed a computational model that explains this ability with two independent internal clocks that integrate temporal information simultaneously as a time-adaptive opponent-Poisson drift-diffusion process (TopDDM; Balcı & Simen, 2016;Simen et al., 2011).According to this model, one of these TopDDM processes serves as a sensory clock, and the other TopDDM processes serve as the motor system's clock.The order by which these two processes first cross the decision threshold determines the error direction (i.e., if the sensory TopDDM hits the decision threshold earlier than the motor TopDDM, the response would be treated as 'late').The distance between the threshold crossing of sensory and motor TopDDMs depicts the error magnitude (Akdo gan & Balcı, 2017).Thus, this model assumes the error magnitude and direction judgements emerge from the same source of information.Figure 2 illustrates this information processing model along with its behavioural predictions.
A critical prediction of this model is that because trial-by-trial metric error judgements rely on a relative comparison, participants cannot monitor induced or endogenous temporal biases unless the sensory and motor clock can be differentially modulated.We recently tested this prediction by experimentally inducing timing biases (Öztel & Balcı, 2020).Specifically, participants were shown two reference intervals and then asked to categorize six linearly spaced target durations ranging from 1 to 3.5 s as closer to the short or long target duration.Importantly, the test durations were signaled by a walking stickman animation at one of the three speeds, which has been previously shown to affect the speed of the subjective time vastly (Karşılar et al., 2018).Participants reported their confidence level regarding the accuracy of their categorization judgements.Results demonstrated a significant shift in the psychometric functions that implies duration that are presented with high-speed motion were perceived as longer (i.e., earlier point of subjective equality [PSE]).Interestingly, the confidence levels closely followed the shifts in PSE, which implies an inability to keep track of the experimentally induced biases in time perception.Thus, these results align with the predictions of Akdo gan and Balcı's (2017) model of temporal error monitoring (Figure One study by Brocas et al. (2018) demonstrated that participants could monitor their endogenous temporal biases as reflected in their adjustment behaviour in accordance with the bias they hold.While seeming contradictory, this result is not directly comparable with that obtained by Öztel and Balcı (2023a) because of the differences in the types of bias (i.e., endogenous in Brocas et al., 2018;and exogenous in Öztel & Balcı, 2023a).

| ERROR DIRECTION AND MAGNITUDE JUDGEMENTS AS TWO DISTINCT COMPONENTS
As discussed, the recent work on metric error monitoring as another aspect of consciousness reveals a F I G U R E 2 Illustration of the temporal error monitoring model proposed by Akdo gan and Balcı (2017).The error monitoring process is initiated by temporal evidence accumulation via two independent clocks (one motor and one sensory clock).The times these two clocks hit the timing threshold are compared to determine the error direction.For instance, if the sensory clock hits the threshold prior to the motor clock, a late error judgement is made.Once the slower of the two clocks hits the threshold, the next-stage decision process begins from the early threshold hitting time to determine the magnitude of the error, which leads to the emergence of subjective confidence (thick magenta trajectories as proxy for deviation).Accordingly, the distance between the two threshold hitting times determines the magnitude of the error where large distances depict large error magnitude and thus decreased confidence.
dual-component structure that is composed of the magnitude (as indexed with the subjective confidence) and the directionality judgements.Given the assumption that the magnitude and directionality judgements stem from the same source (Akdo gan & Balcı, 2017), the metric error monitoring model would predict that these two indices of error monitoring have comparable sensitivities to similar influences.For example, Öztel and Balcı (2023b) tested whether temporal error monitoring ability could be disrupted by exogenous working memory loads as documented in perceptual metacognition (Maniscalco & Lau, 2015).Participants were tested in a temporal reproduction task, during which they performed mental alphabetization at different difficulty levels.Contrary to Maniscalco and Lau's (2015) finding of disrupted metacognitive ability, Öztel and Balcı (2023b) found intact temporal error monitoring ability.Critically, this finding applied to both indices (i.e., confidence and directionality judgements), demonstrating that the two indices of metric error monitoring have similar sensitivities to exogenous influences.Thus, this result aligns with the prediction of the temporal error monitoring model that the two indices of temporal error monitoring (confidence and directional judgements) would be similarly subjected to the same external influence (Akdo gan & Balcı, 2017).
Contrary to Akdo gan and Balcı (2017), one recent study demonstrated that the two components of the metric error monitoring ability have distinct sensitivities to different endogenous factors such as error agency and the belief of error agency (1 Öztel & Balcı, 2022): In six separate experiments, we consistently demonstrated that the error magnitude monitoring was sensitive to the beliefs regarding the error agency (i.e., believing to have made the error or not), such that only the errors that were believed to be self-made were accurately monitored.On the other hand, the errors that were not self-made (i.e., committed by another participant or simulated errors) were more precisely classified as shorter or longer than the target duration (i.e., error directionality judgements).Together, these findings highlight the different sensitivity of the two components of the metric error monitoring to the endogenous factors (belief of agency).This finding, thus, reveals the idiosyncratic features of metric error components, which should not be overlooked in future research.
Why there is seemingly such a distinction between the error magnitude (as indexed with the subjective confidence) and error direction might stem from the phenomenological differences between the two components.
The monitoring of error magnitude, as operationalized with confidence, can induce a more subjective experience, which renders it insensitive to the errors that are believed to stem from another source than the self.Moreover, subjective confidence can render the magnitude calculation noisier than the discrete direction comparison.On the other hand, the categorization of the errors as short or long might be treated as a FOJ by the participants, which could be exempt from metacognitive noise (Maniscalco & Lau, 2012).
Given the novelty of temporal error monitoring as a research domain, there is so far limited empirical support for the isolated discrete metric error monitoring components, as evidenced by a single study.Nonetheless, the findings of this stud exhibit consistent repeatability across multiple experiments.Still, further replication of these findings across other metric domains, such as number and space, is needed to test the generalizability and robustness of the identified distinction of these components.This result calls for and provides an opportunity to refine modelling approaches to account for these recent findings.
Taken together, recent research expands the general scope of metacognition as a conscious state beyond categorical domains (such as perception and memory) with unique attributes of magnitude and direction monitoring.This contemporary approach to the study of metacognitive processing demonstrates that this capability brings about an informationally much richer area for future research with closer correspondence to the processing of errors as they occur in real life.

| ANIMAL RESEARCH ON ERROR MONITORING ABILITY
Whether or not animals have consciousness has long been an unresolved controversy.One way to assess basic forms of consciousness in animals is to test their metacognitive abilities.This aspect of consciousness is mainly investigated with error monitoring tasks that are adapted for the animal subjects in a way that they can indicate their level of subjective confidence in various ways.In one pioneering work, Kornell et al. (2007) demonstrated that rhesus macaques (Macaca mulatta) could keep track of their uncertainties as reflected in their informationseeking behaviour, which generalized across different tasks.Interestingly, Foote and Crystal (2007) demonstrated that rats could keep track of their timing uncertainty; where they trained rats on a duration discrimination task (i.e., if the given tone duration among eight of them is short, press one of the levers, and if long, press the other lever to receive reward).Rats could choose to take the test for the opportunity to get a big reward (if the response is correct) or decline the test to receive a small but guaranteed reward after hearing the tone.Results demonstrated that rats' decline response was highest for the intermediate durations, which the authors interpreted as uncertainty monitoring ability.
Many studies investigating the error monitoring ability in non-human animals use a similar approach by combining easy and difficult trials (for a similar discussion, see Hampton, 2009;Smith et al., 2010).This approach, thus, relies on the assumption that while performing the easy trials with no struggle, the animal will seek a hint in difficult trials.This kind of behavioural pattern can be regarded as 'low confidence'.However, results obtained from these studies cannot be readily interpreted as an ability of uncertainty monitoring because the opt-out responses could be learned via differential reinforcement rather than spontaneously emerging.To eliminate this confound, Smith et al. (2010) trained monkeys in four different perceptual tasks of which the presentation order was randomized across trials.In all four tasks, subjects used the uncertainty option in difficult trials most, even without a reward provided, demonstrating the generalizability and the flexibility of the uncertainty monitoring ability.Templer et al. (2017) also addressed this possibility in a delayed matching-to-sample (DMTS) task.They tested rats in a metamemory task where they were required to remember the place of the sample odour.The rats could either choose to take or decline the test or be forced to take the test.Results demonstrated that rats were using the decline option optimally in the test trials such that they opted out of the trial more as memorizing difficulty increased.Furthermore, rats were more accurate when they 'chose' to perform than when they were forced to perform.Critically, this finding was also gathered in a series of generalization tasks in which the memory was controlled for (i.e., by varying the delay between sample and matching phases presenting multiple cue odours, no cue odour at all in the matching phase).
In a similar experiment, Yuki and Okanoya (2017) tested rats in a delayed matching-to-position (DMTP) test.Rats were given a cue or a random delay (no cue), which followed a choice to take the test or to opt-out.If the rat chose to take the test, either two or six of the choice options were cued for the rat to respond.If the subject responded correctly, the reward was maximized; otherwise, zeroed.If the rats opt out of the test, the correct location was cued and rats could receive a small but guaranteed reward.The memory accuracy was significantly diminished for the six option trials when the rats were forced to take the test.On the other hand, the opt-out rates were significantly higher when the position was not cued, which points to a metacognitive monitoring ability.Such performance was only weakly observed in pigeons (Iwasaki et al., 2019).
In another experiment, Kirk et al. (2014) tested rats in a maze in which subjects had to press a lever to obtain the cue for the location of the larger reward.In the first experiment, authors tested the rats in a t-maze in which one of the arms contained a larger and thus more preferred food reward.After the lever press, the light of the arm that contained the food turned on to cue the correct location.Rats continued to press the lever even after the immediate reward was discontinued to gain cue.However, rats discontinued lever pressing behaviour when the reward was located in the same arm.This result, thus, points to the idea that the rats were indeed pressing the lever to obtain a cue regarding the location of the reward.Interestingly, the lever press responses increased when rats were tested in an eight-arm maze, in which the informative value of the lever pressing increased by four times, again pointing to a metacognitive ability.
None of these studies address whether subjects keep track of their metric errors.One recent study by Kononowicz et al. (2022) addressed this gap in the literature.Rats were trained in a temporal reproduction task.After each reproduction, the two levers were available to claim a reward.If the rat made a small error with respect to the target duration, choosing the correct port (port A) resulted in two reward pellets; otherwise, no reward was given.If their error was large, choosing the other port (port B) resulted in a one reward pellet; otherwise, no reward was given.Results demonstrated that rats could correctly chose the levers (above chance level) in a way that matched their error magnitude.These results, thus, point to the ability to report the timing error magnitude in the rats.
While pointing to a very fundamental question of whether non-human animals can also capture the magnitude of their errors, this result still falls short of demonstrating an error direction monitoring in animals, which is also an open question (for a detailed discussion, see Balcı, 2022).In any case, these early promising findings provide the necessary behavioural tools to investigate the neural basis of metric error monitoring in non-human animals, ideally by combining correlational and manipulative tools.

| NEURAL UNDERPINNINGS OF METACOGNITION AND METRIC ERROR MONITORING
The main brain regions that are pointed to have critical roles in metacognition are the dorsal, anterior, and rostral parts of the prefrontal cortex (PFC).For example, the enhanced activity in rostrolateral (e.g., Fleming & Dolan, 2012) and left dorsolateral PFC (dLPFC, Fleming & Dolan, 2012;Harty et al., 2014;Magno et al., 2006;Rahnev et al., 2016) along with lateral anterior frontal cortex (Morales et al., 2018) during the metacognitive evaluations are documented via functional magnetic resonance imaging (fMRI) studies.Moreover, a strong relationship between grey volume in the anterior PFC (aPFC) and metacognitive performance in perceptual decision making has been documented (McCurdy et al., 2013; see also Fleming & Lau, 2014).Parallel results are reported by Baird et al. (2013, see also Rahnev et al., 2016).These results provide correlational support for the PFC contribution to the metacognitive processes.
These correlational findings are coupled with causal support.For example, targeting dLPFC bilaterally utilizing a transcranial magnetic stimulation (TMS) technique, Rounis et al. (2010) found that inhibition of this area resulted in reduced metacognitive accuracy in perceptual decision making while the first-order performance remained intact.Another study found lower confidence judgement as a result of dLPFC inhibition TMS (Shekhar & Rahnev, 2018).A clinical study by Fleming et al. (2014) demonstrated that aPFC lesion resulted in a specific disruption in metacognition in perceptual decision making settings.Together, these studies point to the critical involvement of dLPFC and aPFC in metacognitive processes by utilizing both correlational and causal designs (see also Fleming, Huijgen, & Dolan, 2012).
The documented neural underpinnings of metric error monitoring in the current literature are relatively limited.Notably, there have been pertinent insights from studies employing electroencephalogram (EEG) recordings during temporal error monitoring performance.In one of these studies, Kononowicz et al. (2019) tested participants in a temporal production task.Participants were asked to produce a target duration (FOJ).After each of their productions, participants were asked to report how much and towards which direction they thought their production deviated from the target on a slider scale (SOJ).Their behavioural results reproduced Akdo gan and Balcı's (2017) findings.This study further demonstrated that changes in the beta power upon the initiation of the temporal production predict the temporal production.Given that the beta oscillations are known to be responsible for network inhibition, especially for motor actions (e.g., Engel & Fries, 2010;Palva & Palva, 2012;Wang, 2010;Whittington et al., 2000; see also Fischer et al., 2018;Kelly et al., 2021;Murphy et al., 2016), this finding might relate to response inhibition during time production (Kononowicz et al., 2019).Accordingly, the beta oscillation was also related to the produced durations; where higher beta power was observed for longer productions.This result is a clear demonstration of how temporal estimations that require a motor response are neurally encoded via network inhibition guided by the beta oscillations (Kononowicz et al., 2019, see also van Wassenhove, 2023).Critically, when the SOJ was correct, the beta power better predicted the FOJ.Finally, the Euclidean distance between beta network trajectories, as depicted by the temporal productions with respect to the target, was the predictive factor for SOJ, where the greater separation between these trajectories corresponded to more precise metacognitive judgements.Notably, the source localization analysis suggested the precuneus as the potential origin of temporal error monitoring ability (Kononowicz et al., 2019;Kononowicz & van Wassenhove, 2019), a region that is also implicated in error monitoring within memory domains (e.g., McCurdy et al., 2013).
Another study by the same research group tested two key hypotheses regarding the information processing bases of temporal error monitoring (Kononowicz & van Wassenhove, 2019): The first hypothesis, 'temporal error detection (TED)', describes the temporal error monitoring as the latency between the motor response and the intended response upon the termination of a temporal reproduction.According to this hypothesis, temporal error monitoring relies on an online monitoring process based on comparing the intended and executed action until the reproduction is terminated by a second motor response.Accordingly, temporal error detection process would yield a V-shaped relationship between the neural activity and the temporal errors (i.e., maximum activity at largest temporal errors).The second hypothesis, 'temporal metacognition (TMC)', on the other hand, describes temporal error monitoring on the basis of a readout mechanism that predicts a linear relationship between the temporal errors and the neural activity.In line with the TMC hypothesis, results revealed a negative linear relationship between the alpha power (which is known to be related to the attentional processes, e.g., Klimesch et al., 1998; see also Kononowicz et al., 2019) and the temporal errors (as well as the self-evaluations, Kononowicz & van Wassenhove, 2019).Moreover, there was a relationship between the beta power and the alpha power, which points to a readout process of the temporal representations (as encoded with beta power) and the attentional processes (alpha power) for the emergence of a temporal error monitoring ability (Kononowicz & van Wassenhove, 2019).Finally, in line with Kononowicz et al. (2019), the alpha power's relationship with self-evaluations was discovered to exhibit a topographical resemblance to the precuneus region.Taken together, these studies shed light on the neural processes underlying temporal error monitoring by highlighting the role of alpha and beta oscillations and precuneus as the suggestive brain region that underlies temporal awareness.However, given the limited spatial resolution of EEG and the absence of relevant fMRI studies, the results on neural topography should be evaluated with extra caution.
While numerous studies emphasize shared variances within the metric domains, it is important to acknowledge their unshared variances as well.Consequently, the variance unique to temporal error monitoring seems to find its neural representation in the precuneus (see Kononowicz et al., 2019;Kononowicz & van Wassenhove, 2019).This proposition merits distinct attention in view of widely accepted theories advocating for a domain-general processing mechanism for metric information primarily located within the intraparietal sulcus (IPS).Walsh's (2003) A Theory of Magnitude provides a notable illustration of this theoretical perspective.Furthermore, the behavioural patterns observed in the context of metric error monitoring, as demonstrated in the study conducted by Yallak and Balcı (2021), serve to emphasize this line of inquiry.Namely, does the precuneus potentially serve as the neural substrate that underpins metric error monitoring as a unified process, separate from the domain of categorical error monitoring?This question prompts a more profound exploration into the complex neural foundations of error monitoring across diverse domains.

| CONCLUSION AND FUTURE DIRECTIONS
Emerging as a newly discovered facet of consciousness, metric error monitoring not only captures the subtle complexities of metacognitive processing but also broadens the applicability of research findings by aligning better with real-world scenarios.This novel dimension brings the advantage of enhanced generalizability, bridging the gap between theoretical investigations and practical contexts.An essential factor in this expanded scope is the recent increase in comparative approaches, which have extended the focus of error monitoring inquiries beyond categorical decisions.These studies have shed light on a remarkable shared ability among humans and other animals; the capacity to keep track of error metrics on a trial-by-trial basis.Thus, the landscape of error monitoring studies has fast evolved, leading to a more comprehensive understanding of the related cognitive processes and their intricate interactions with awareness.
An existing gap in the metric error monitoring literature is the investigation of how this ability is affected in different neuropsychiatric, neurodegenerative, and neurodevelopmental disorders.While the disruptive effect of many psychiatric disorders such as obsessivecompulsive disorder (OCD; e.g., Grützmann et al., 2016), schizophrenia (e.g., Alain et al., 2002), Alzheimer's disease (e.g., Balouch & Rusted, 2014), Parkinson's disease (e.g., Falkenstein et al., 2006;Ito & Kitagawa, 2006;White et al., 2016), major depression (e.g., Georgiadi et al., 2011), autism spectrum disorder (ASD, e.g., Grainger et al., 2014), sleep deprivation (e.g., Tsai et al., 2005) and substance abuse (Wasmuth et al., 2015) on metacognitive processing is widely documented in the literature, their effect on the metric error monitoring ability has not been widely investigated yet.One reason is that the lack of knowledge regarding neural substrates of metric error monitoring renders it difficult to formulate hypotheses regarding the disruptive effect of different neurodegenerative, neurodevelopmental and neuropsychiatric disorders on this form of metacognition.To this end, only one study showed disrupted error monitoring ability in both categorical and metric error monitoring in high-functioning ASD patients compared with typically developing children (Doenyas et al., 2019).
Another potential effect of neuropsychiatric disorders might be led by major depression.Depression has been widely documented to create temporal distortions presumably due to diminished dopamine levels (e.g., such that time flows slowly as discussed in Thönes & Oberfeld, 2015).A distorted sense of time is also documented in schizophrenic patients presumably as a result of hyperdopaminergic activity (for a detailed discussion, see Bonnot et al., 2011;Ueda et al., 2018).However, while humans cannot monitor the stimulusinduced biases (Öztel et al., 2021), it is unknown whether physiologically induced temporal biases can be monitored (as in the case of major depression and schizophrenia).This gap in the literature needs to be addressed by future studies.Additionally, disrupted perceptual error monitoring ability in schizophrenia (e.g., Alain et al., 2002) and oversensitivity to error commission in major depression (e.g., Malejko et al., 2021) is documented in the literature, which remains to be tested in metric domains.