Utility of Feedback Has a Greater Impact on Learning than Ease of Decoding

— While feedback is a key facilitator of learning, researchers have yet to determine the ideal feedback process for optimal performance in learners. The current study investigates the combined eﬀects of ease of decoding, and utility of feedback during learning. Accuracy and rate of learning were recorded alongside changes to the feedback related negativity (FRN), an event-related potential (ERP) elicited by feedback stimuli. This study investigates the FRN within the context of future-focused directive feedback (DF), in addition to past-focused evaluative feedback (EF) typically seen in the neuroscience literature. Results indicate a main eﬀect of utility together with an interaction with ease of decoding on the accuracy data, but only the main eﬀect of utility on learning rate. DF produced an FRN, like EF, which was then larger during high-utility feedback, speciﬁcally following negative EF or when hard-to-decode. Implications and future research directions are discussed.

In contrast, feedback messages that are difficult to understand can lead to frustration and are perceived to be less effective in facilitating learning (e.g.Jonsson, 2013).This frustration can induce a state of learned helplessness (Teodorescu & Erev, 2014), whereby students become disenfranchised with their feedback due to its perceived difficulty.As well as adding to cognitive load, feedback that is hard-to-decode makes a limited contribution to reducing learners uncertainty (Shute, 2008).
In terms of feedback utility, cognitive models also inform our understanding of the likely effectiveness of different types of feedback.Kulhavy and Stock (1989) distinguished between the functions of verification and elaboration in feedback.Verification feedback informs the learner whether their response was incorrect or correct, without further explanation.In contrast, elaborated feedback provides guidance, for example indicating why a particular response was correct, or providing recommendations to improve performance.In a comprehensive review of formative feedback literature, Shute (2008) concludes that verification feedback in isolation has limited utility for learning, whereas elaborative feedback supports learners to self-correct and advance their understanding.Different types of elaborative feedback exist on a spectrum from lower to higher utility, depending on the degree of explanation and direction it provides.
However, the FRN has only been recorded using feedback regarding past performance, termed evaluative feedback (EF) in educational sciences.EF is similar in function to verification feedback.In contrast, directive feedback (DF), a form of elaborative feedback, focuses on future performance, which can vary in ease of decoding and utility, but typically provides greater utility than EF (Carless, 2006;Hattie & Timperley, 2007).The FRN is typically regarded as measuring the amount, or complexity of, feedback information (Du et al., 2018).Therefore, how easy feedback is to decode, and how easy it is to utilize, are expected to influence this signal, with both hard-to-decode and high-utility feedback engaging more cognitive processes than their counterparts.
The present study aims to test whether utility and ease of decoding of DF affected learning and neural mechanisms of feedback processing.We hypothesized that: 1 Easy-to-decode feedback better facilitates learning than hard-to-decode feedback; 2 High-utility feedback better facilitates learning than low-utility feedback; 3 DF will elicit an FRN signal like EF; 4 The peak-to-peak FRN will be larger for high-utility, and hard-to-decode feedback, due to greater cognitive demands.5 Valence of EF will influence the FRN produced by subsequent DF due to differential levels of informative value.

Participants
Data were collected from 27 volunteers recruited through social media snowballing and through a participant recruitment system (SONA).One participant's data were excluded because of chance performance in the learning tasks.
The final sample consisted of 26 students (20 female, aged 18-26, M = 20.65 years), who reported normal or corrected-to-normal vision, no contraindications of EEG, and no color-blindness.Participants gave written informed consent before the experiment.Undergraduate psychology students (N = 21) received a course incentive in exchange for participation, while other participants received no compensation.

Materials
The task was an adaptation of a category learning task (Knowlton, Squire, & Gluck, 1994).Participants aimed to ascertain which flavor ice cream various potato head figures (henceforth called "the character") preferred.This setup allowed for the variation of three different features (hat, eyes or shoes), with three variations each (e.g., black, white or blue shoes) that were systematically varied to create the 27 character images used in this experiment.The different variations of a single feature dictated which flavor ice cream (vanilla, chocolate, or strawberry) the character would prefer.For example, when the shoe color indicated preferred flavor, each color shoe directly mapped onto one ice cream flavor.Participants were asked to firstly deduce which feature, and then which variation of said feature, predicted the preferred ice cream flavor based on the EF and DF they received.EF consisted of a stylized frowning red face for incorrect responses, a green stylized smiling face for correct responses, and the words "No response" if the participant did not respond within the time given.
DF consisted of a cartoon hat, eyes and shoes on a white background (see Figure 1).Colored bars were used to construct the various forms of DF.High-utility screens contained a green bar behind the feature relevant for flavor preference, while low-utility screens contained a red bar behind an irrelevant feature.The easy-to-decode screens included just a singular green or red colored bar, while the hard-to-decode screens included two other colored bars to increase perceptual load (Lavie, Hirst, De Fockert, & Viding, 2004).Therefore, the easy-to-decode with high-utility (EH) condition, contained a single green bar, whereas the easy-to-decode with low-utility (EL) condition contained a single red bar.The hard-to-decode with high-utility (HH) condition contained purple and pink bars alongside the green bar, while the hard-to-decode with low-utility (HL) condition displayed a red bar among a yellow and blue bar.The noninformative bar colors were chosen to match the informative bar in terms of brightness and intensity, while being distinct from red or green.

Procedure
Trials began with a crosshair for 500 ms followed by the character stimulus for the following 2,500 ms.Participants were asked to respond during this 2,500 ms, after which they were presented with the EF screen for 500 ms, and finally the DF screen for 2,000 ms.Instructions required participants to respond with "1," "2" or "3" on the number pad of keyboard to designate the preferred ice cream flavor.The correct response key was counterbalanced across participants.On-screen instructions informed participants that they would receive two feedback screens; one informing them whether their choice was correct (EF), and one that would help them uncover the relevant feature for that block (DF).Instructions noted that green feedback bars indicated a relevant feature, while red bars highlighted an irrelevant feature, but there could be other, noninformative, colors present alongside these cues.After finishing these instructions participants were shown a diagram of a trial (Figure 1) including EH DF to familiarize them with the stimuli.
The experiment was presented using E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA) and consisted of three runs of four blocks, one per DF conditions (EH, EL, HH, HL), with a self-directed break after each block.Each block contained 27 trials, each presenting a different character.Each flavor was correct in nine trials per block, resulting in a 33% probability of guessing.The first run was presented in a fixed order of supposedly increasing difficulty: EH, EL, HH, HL, but then proceeded in a pseudorandom manner, with each condition presented once before the first condition was repeated.The experimental task took an average of 30 min to complete and the whole study, including EEG set up and debriefing, was not longer than 2 hr.Participant accuracy was recorded for each trial to comprise the behavioral data in the analysis.Learning curves were created after the study using a rolling three-point average.These values were then fitted by an exponential rise to maximum model to estimate rate of learning.See Supporting Information for model formulae.

EEG Recording and Preprocessing
EEG data were recorded from 30 Ag/AgCl electrodes mounted on an elastic cap (Easycap) and positioned according to the extended international 10-20 system, with online reference at FCz and ground at FPz.Data were amplified from DC to 70 Hz at a sampling rate of 500 Hz using a BrainAmp MR (Brain Products GmbH, Germany).To control for eye movements, the EOG was recorded from the suborbital ridge of the left eye.Impedance of all electrodes was kept below 10 kΩ.Data were processed offline using Brain Vision Analyser.After, data were referenced to linked mastoids so that FCz could be analyzed, a band pass filter of 0.5 Hz to 30 Hz was applied.Artifacts resulting from eye blinks were removed using an automatic ocular correction independent components analysis (Jung et al., 2000) and remaining artifacts were automatically excluded from further analyses using a 50 μV/ms gradient and a 100 μV/200 ms maximal amplitude change criterion.Data were segmented from 100 before to 700 ms after feedback (EF and DF) onset.All segments were baseline corrected from −100 to 0 ms before averaging.FRN was quantified for each participant as peak-to-peak amplitude by calculating the difference between the most negative amplitude from 230 to 330 ms post feedback presentation and the amplitude of the preceding positive peak in a time window from 170 to 230 ms (Ferdinand, 2019;Holroyd et al., 2004;Nieuwenhuis et al., 2004).Each dataset was manually checked for accuracy of peak detection before data were extracted for all feedback conditions separately.For all conditions and participants the peaks were accurately detected in the respective time windows.After artifact extraction an average of 37.80 (SD 3.10) trials remained in each condition, an appropriate number for FRN analysis (Marco-Pallares, Cucurell, Münte, Strien, & Rodriguez-Fornells, 2011).
Next, learning curves were analyzed to assess within-block learning (see Table 1 and Figure 2).A 2 × 2 repeated measures ANOVA revealed a significant effect of utility on learning rate (F(1, 25) = 26.98,p < .001,η 2 p = .52),with high-utility feedback eliciting steeper learning curves.Neither an effect of decoding (F(1, 25) = .24,p = .63,η 2 p = .01),nor an interaction between the two factors (F(1, 25) = .56,p = .46,η 2 p = .02)were observed.Finally, the first run was presented in a fixed order, while the remaining eight blocks were presented in a pseudorandomized order.To accommodate for these effects the accuracy data were re-analyzed after excluding the first run.Utility (F(1, 25) = 31.59,p < .001,η 2 p = .56)and the interaction between utility and decoding (F(1, 25) = 8.09, p = .009,η 2 p = .24)remained significant, while decoding remained nonsignificant (F(1, 25) = 0.57, p = .46,η 2 p = 0.02).Hence, the effects of utility and the interaction of both factors on accuracy were independent of order effects as the direction of the effect persisted even when the first four blocks were removed.

DISCUSSION
The current study aimed to investigate how the utility and ease of decoding of DF affected learning and neural mechanisms of feedback processing.In agreement with previous findings (Butler et al., 2013), high-utility feedback consistently enhanced the overall accuracy and the speed of learning.However, contrary to our predictions, ease of decoding did not.Nonetheless, both factors did interact within the accuracy data, but not in the learning rate data.This suggests that there is a large utility effect for easy-to-decode information, but when information is hard-to-decode, utility is less pertinent to learning.This interaction indicates that decoding could act as a "gatekeeper" for the effects of utility, rather than exert an influence on learning on its own grounds.These findings could be interpreted in line with the attentional load theory (Lavie et al., 2004).According to this theory high perceptual load in the hard-to-decode conditions would limit the attentional processing of irrelevant information, consequently reducing fixation on low-utility information.
Therefore, our findings support calls for feedback to be both easy-to-understand, and highly applicable (Shute, 2008;Winstone, Nash, Parker, & Rowntree, 2016;Winstone, Nash, Rowntree, & Menezes, 2016), as this seems to be the optimum combination for learning speed and outcome.Hard-to-decode feedback reduced differentiation between high and low utility feedback on learning outcome, Such feedback could be associated with an increase in learned helplessness (Teodorescu & Erev, 2014), and a barrier to engagement (Jonsson, 2013;Winstone et al., 2017).By contrast, easy-to-decode with low-utility feedback yielded low accuracy and learning rate, suggesting that with clear but noninformative meaning, students may disengage as they are unsure of how to improve in future.
The present study also examined the electrophysiological correlates of processing DF contingent on EF.Replicating previous findings (Holroyd & Coles, 2002;Nieuwenhuis et al., 2004), negative EF elicited a larger amplitude FRN than positive EF.Furthermore, larger FRN to DF and especially to high utility feedback after negative compared to positive EF suggests that DF is more informative after an unexpected and unsatisfying result.It also suggests that participants approach the current task in a two staged process in a hierarchical way.Participants first determined the relevant feature using the DF, before matching the variations of that feature to the ice cream preference.In case of negative EF participants seem to engage more with DF to reassure they picked the correct feature for their decision.Thus, high utility DF elicited a FRN signal, which indicates that reinforcement of the relevant feature is a mandatory first step to solve the task.Thus, the information view of the current results is in line with a recent hierarchical reinforcement learning theory (Holroyd & Yeung, 2012) stating that the FRN reflects the reinforcement of higher-level behavioral plans and the sequences of specific behaviors directed toward these plans.
One potential limitation of the methodology is whether the perceptual load (Lavie et al., 2004) used in the current study to simulate the level of decoding difficulty is comparable to the difficulty of decoding complex academic terms in educational settings.If the current operationalization was not an accurate proxy for decoding complex feedback, it could explain why previous reviews indicate that this factor affects learning (Jonsson & Panadero, 2018).Alternatively, the effects of decoding on learning may have been confounded in previous studies by the utility present in the feedback provided.Further studies, comparing different types of feedback, such as common feedback terms or phrases, would address this question.Despite a significant interaction of decoding and utility, the effect size of the main effect of decoding is rather small.Therefore, a replication is required to examine whether decoding has no individual impact upon learning measures.
Finally, future studies should test whether cognitive factors known to influence the FRN elicited by EF, such as perceived fairness (Riepl, Mussel, Osinsky, & Hewig, 2016), have similar effects upon the FRN elicited by DF in this study.Similarly, more in-depth analysis of the neural signature could reveal further similarities or differences between processes underlying EF and DF.We also did not analyze any ERP's beyond the FRN.Future research should consider the P300, a signal also sensitive to various aspects of feedback (San Martín, 2012), such as the expectancy of feedback (Ferdinand, Becker, Kray, & Gehring, 2016).The current study was unable to investigate this aspect as all DF conditions were identical in terms of probability.
In conclusion, the present study demonstrated that while utility produced a significant main effect on both learning and FRN amplitude, decoding acted as a "gatekeeper" for the effect of utility for accuracy, while both valence and decoding of feedback acted as "gatekeeper" for the effect of utility on the FRN amplitude.To our knowledge, this has been the first experimental study to disentangle the simultaneous effects of ease of decoding, and utility of feedback on learning, alongside uncovering the neural signals associated with DF, and providing a means for future studies to consider the nuances of DF.

Fig. 1 .
Fig. 1.A sample trial where the hat is the relevant feature, displaying positive evaluative feedback and the hard to decode with high utility directive feedback screen.

Fig. 2 .
Fig. 2. The average percentage of correct responses on each trial separated by directive feedback condition.

Fig. 3 .
Fig. 3. (a) Feedback-related negativity signal (260-280 ms) to positive (dashed) and negative (solid) evaluative feedback recorded at the FCz electrode.(b) Topography for the Feedback-related negativity signal baseline corrected to the peak of the P2, for positive (left) and negative (right) evaluative feedback.

Fig. 4 .
Fig. 4. (a) Feedback-related negativity signal (240-290 ms) to four directive feedback conditions at the FCz electrode: easy to decode with high utility (EH), easy to decode with low utility (EL), hard to decode with high utility (HH), and hard to decode with low utility (HL).(b) Topography of Feedback-related negativity signal, baseline corrected to the peak of the P2, for the EH (top left), EL (top right), HH (bottom left), and HL conditions (bottom right).

Table 1
Descriptive Statistics of the Overall Accuracy Data, Accuracy for Final Eight Blocks, and Learning Rate (Gradient of Learning Curve) for Each Directive Feedback Condition

Table 2
Mean and Standard Deviation of Peak-to-Peak Difference of the Feedback Related Negativity Signal at FCz Across all Directive Feedback Conditions Following Either Positive of Negative Evaluative Feedback