Performance of typical and superior face recognizers on a novel interactive face matching procedure

Unfamiliar simultaneous face matching is error prone. Reducing incorrect identiﬁcation decisions will positively beneﬁt forensic and security contexts. The absence of view-independent information in static images likely contributes to the difﬁculty of unfamiliar face matching. We tested whether a novel interactive viewing procedure that provides the user with 3D structural information as they rotate a facial image to different orientations would improve face matching accuracy. We tested the performance of ‘typical’ (Experiment 1) and ‘superior’ (Experiment 2) face recognizers, comparing their performance using high-quality (Experiment 3) and pixelated (Experiment 4) Facebook proﬁle images. In each trial, participants responded whether two images featured the same person with one of these images being either a static face, a video providing orientation information, or an interactive image. Taken together, the results show that ﬂuid orientation information and interactivity prompt shifts in criterion and support matching performance. Because typical and superior face recognizers both beneﬁted from the structural information provided by the novel viewing procedures, our results point to qualitatively similar reliance on pictorial encoding in these groups. This also suggests that interactive viewing tools can be valuable in assisting face matching in high-performing practitioner groups.

identity) are the most common type of error, with failure rates of 40-60% in field tests (Davis & Valentine, 2009;Kemp, Towell, & Pike, 1997).Error rates are high even among passport-issuing officers and do not decrease with additional years of experience or professional training (White, Kemp, Jenkins, Matheson, & Burton, 2014).Investigating ways of reducing errors can have significant benefits to the accuracy of applied tasks such as security screening and police investigations.In this paper, we evaluate methods for improving simultaneous one-to-one face matching that enable viewers to make best use of variations in viewpoint information available to them.We present a novel procedure in which the comparison image can be manoeuvred into different orientations at the discretion of the viewer.

Differences in viewpoint
In face matching, it might be necessary to try and reconcile two images of faces that vary in terms of orientation.If the face is unfamiliar, the viewer will have no knowledge of the person's 3D facial structure.In face memory tasks, differences in orientation between study and test undermine recognition accuracy (Bruce, 1982;Colloff, Seale-Carlisle, et al., 2020), but knowledge of 3D facial structure mitigates the effect of viewpoint dependence (Hill, Schyns, & Akamatsu, 1997;Longmore, Liu, & Young, 2008).In face matching, where there is limited memory load, some evidence suggests that performance suffers less across differences in viewpoint than it does in face memory tasks (Estudillo & Bindemann, 2014).However, even relatively minor differences in viewpoint can create problems for unfamiliar face matching (Bruce et al. 1999;Hancock, Bruce, & Burton, 2000), particularly in more difficult matching tasks (Bruce et al., 1999).
Based on these results, it might be expected that providing participants with both frontal and profile facial views would improve matching performance.Surprisingly though, Kramer and Reynolds (2018) found that accuracy did not differ across three conditions in which participants matched two pairs of frontal images, two pairs of profile images, or two pairs of images featuring one frontal and one profile view.The authors explain the lack of benefit in the latter condition by proposing that there may have been no mental integration of the frontal and profile views.Put differently, the participants did not use the two images to build a 3D view-independent representation of the face, perhaps because the orientations were too disparate.However, as people are able to extract information across multiple frontal images of the same face to support the construction of stable representations (Menon, Kemp, & White, 2018;Menon, White, & Kemp, 2015;White et al., 2014), showing a face moving fluidly from side to side may facilitate the building of a view-independent representation.

The benefit of movement
The results of various studies attest to the benefit of fluid movement for face perception.Pike, Kemp, Towell, and Phillips (1997) found that rigid head rotations improved recognition performance, arguing that such movement provides 3D structural information.However, there are various ways in which a face can move, and much of the literature has focused on non-rigid movement, such as smiling, frowning, or speaking as cues to identity (e.g., Knappmeyer, Thornton, & B€ ulthoff, 2003;Pilz, Thornton, & B€ ulthoff, 2006;Smith, Dunn, Baguley, & Stacey, 2016).For example, effects of movement observed when recognizing familiar faces (Lander & Bruce, 2000;Lander, Christie, & Bruce, 1999) are explained in terms of the ability to access idiosyncratic non-rigid movement stored in memory (Lander & Chuang, 2005).Whilst Thornton and Kourtzi (2002) tested the effect of non-rigid changes in expression on unfamiliar sequential face matching, we are not aware of any studies that have explored the effect of rigid rotation movement in the context of simultaneous face matching.Given the disruptive effect of viewpoint dependence (Bruce et al., 1999), this is an important question.
The interactive procedure Standard face matching tasks in operational contexts involve passive mental comparisons rather than active engagement with images.A procedure in which users can interact with one face in a pair to be matched; manoeuvring it fluidly to different viewpoints along a vertical axis, may support matching performance.The education literature is replete with examples of task engagement improving learning outcomes (Freeman et al., 2014).This can be explained by increased attentiveness and depth of encoding (Craik, 2002;Craik & Lockhart, 1972), which are also beneficial to face processing (Bower & Karlin, 1974;Liu, Ward, & Markall, 2007; see also Palermo & Rhodes, 2007).Interactivity should increase the depth of encoding, and rotation should facilitate the building of a 3D viewindependent representation, providing structural information that is unavailable in static snapshots.This may enable operators to familiarize themselves with the face, and to gain knowledge of how invariant features range in appearance across different viewpoints of the face (Lander et al., 1999;Pike et al., 1997).Such a procedure will not only confer some of the benefits of familiar face processing but will also enable the operator to manoeuvre comparison faces into the same viewpoint, reducing within-person variability across images.The procedure has been successfully employed in face memory tasks, with higher discrimination accuracy observed in an interactive line-up compared to a line-up composed of static images of faces, which is commonly used by US police (Colloff, Flowe, et al., 2020;Colloff, Seale-Carlisle, et al., 2020).

Typical and super-recognizers
In recent years, there has been an increasing focus on improving face identification accuracy in applied settings.Given the wide range of individual differences in unfamiliar face perception and recognition ability in both novices (for reviews, see Lander, Bruce, & Bindemann, 2018;Noyes, Phillips, & O'Toole, 2017) and practitioners (White, Towler, & Kemp, 2021), one of the most promising solutions is to select individuals on the basis of ability.There is a practical need to test the benefits of novel procedures in both typical and 'super-recognizers', who are likely to use these solutions in professional settings (e.g., Davis, Lander, Evans, & Jansari, 2016;Davis, Maigut, & Forrest, 2019;Robertson, Noyes, Dowsett, Jenkins, & Burton, 2016).
There are also theoretically important reasons to establish whether there are qualitative differences in face processing between typical and super-recognizers.Research focusing on the other-ethnicity bias provides evidence that typical-and superrecognizer performance does not differ in a qualitative way, and both groups are subject to the same influences.For example, recognition memory in both groups was better for ownthan other-ethnicity faces (Bate et al., 2019;Robertson, Black, Chamberlain, Megreya, & Davis, 2019).Other studies have observed a heightened inversion effect in superrecognizers (Russell, Duchaine, & Nakayama, 2009), and it has been suggested that they rely more on holistic processing (Bobak, Bennetts, Parris, Jansari, & Bate, 2016).There is however inconsistent support for this conclusion, with some super-recognizers exhibiting enhanced holistic processing and others exhibiting the opposite pattern of performance (Belanova, Davis, & Thompson, 2019).Differences in structural encoding provide an alternative explanation for differences in face processing ability.In Bobak, Hancock, and Bate's (2016) one-to-many face matching study, target, and array photographs varied according to viewpoint.Super-recognizers were more accurate than controls.One possible explanation provided is that super-recognizers are better at structural encoding strategies that help them to construct a view-independent representation.In contrast, controls may rely more on less helpful pictorial encoding strategies.
The existing literature provides only mixed evidence that typical recognizers and super-recognizers process faces in a qualitatively different way (see Noyes et al., 2017).Testing both types of recognizer using the interactive system speaks directly to this question.As yet, the hypothesis that super-recognizers are better at structural encoding has not been fully tested.However, if the hypothesis is supported, the ease with which super-recognizers extract structural information from static images would likely limit the usefulness of additional orientation information provided by fluid rotation.It might also mean that interactivity does not improve performance for super-recognizers, who do not need to focus on familiarizing themselves with the way that faces vary across different viewpoints.In contrast to typical recognizers, super-recognizers may gather structural information automatically, without needing to have their attention focused on it by a procedure.

The relationship between confidence and accuracy
The relationship between confidence and accuracy has been investigated in face recognition (e.g., Brewer & Wells, 2006;Sauer & Brewer, 2015;Wixted & Wells, 2017), but only one previous face matching study has systematically analysed the relationship between confidence and accuracy (Stephens, Semmler, & Sauer, 2017).Confidence ratings have been recorded in a minority of face-matching studies, with results showing that whilst super-recognizers might be more confident than controls (Bobak, Hancock, et al., 2016;Davis et al., 2016), even in typical recognizers, confidence has the potential to be diagnostic of accuracy (Stephens et al., 2017;White et al., 2014).If confidence predicts accuracy, confidence should be taken into account in applied settings.

The current study
To investigate possible methods of improving face matching accuracy, we tested how performance varied according to interactivity and levels of orientation information in both 'typical' face recognizers (Experiment 1) and 'superior' face recognizers (Experiment 2).Consistent with previous research (e.g., Belanova, Davis, & Thompson, 2018), groups were defined based on scores on the 102-trial standardized Cambridge Face Memory Test: Extended (Russell et al., 2009).Superior face recognizers achieved scores of at least 93 out of 102 (91%), expected to be achievable by roughly 2% of the population (Belanova et al., 2018;Bobak, Pampoulov, & Bate, 2016).Typical face recognizers scored below this threshold.Participants compared a static image to either a single static image (frontal condition), a series of static images of the face at different orientations (orientations condition), a video showing the face moving from side to side (moving condition), or an interactive image which could be manoeuvred into different orientations using the computer mouse (interactive condition).In Experiments 3 and 4, we directly compared the performance of typical and superior recognizers in the frontal and interactive conditions.
We predicted that typical recognizers would benefit from the availability of orientation information, because it should facilitate the building of a view-invariant representation.We also predicted that typical recognizers would benefit from interactivity because it should direct their attention to the way in which faces vary across viewpoints.If superior recognizers are better at extracting features that are invariant to viewpoint from single images, we would not expect orientation information or interactivity to be as beneficial.

Experiment 1: Typical face recognizers
This experiment examined the effect of multiple viewpoints and viewer interaction on face matching.We were also interested in the effects of these stimulus conditions on the confidence-accuracy relationship.

Design
This was a 4 9 2 mixed factorial design.The between-subjects factor was the comparison image type (frontal, orientations, moving, interactive).The within-subjects factor was identity (same or different).The dependent variables were matching accuracy and selfrated confidence.

Participants
Participants who had previously completed the Cambridge Face Memory Test: Extended (CFMT+) on www.superrecognisers.com, and scored 92 or less, were invited to participate via email.All participants had agreed to be contacted about subsequent experiments.A total of 310 participants completed the experiment.In the interactive condition, 26 participants were excluded because they did not move the comparison image in any of the trials.These data were never analysed.The final sample consisted of 284 participants (119 male, 165 female), with an age range of 18-67 years (M = 33.7,SD = 10.8).Their mean CFMT + score was 76.3 (SD = 9.4).Mean CFMT + scores in other samples (not excluding extreme scores) tend to vary between around 70 and 75 (Bobak, Pampoulov, et al., 2016;Russell, Chatterjee, & Nakayama, 2012).Ethical approval for the experiment was granted by the local Research Ethics Committee.

Apparatus and materials
The stimuli were taken from UNSW Unfamiliar Face and Voice Database (White, Burton, & Kemp, 2016).For each of the 233 people in this corpus, there is a high-quality head and shoulders video of their head turning from 90°left to 90°right, as well as a set of Facebook facial images (M = 12.03 images, SD = 1.93 images).We selected 94 Caucasian adults (58 female, 36 male) from the database.They had an age range of 17-32 years (M = 19.48,SD = 2.20).For each person, we used the video and two of the Facebook images (Facebook 1 and Facebook 2).The Facebook images were selected according to the following criteria: The images should provide a clear view of the person's face, be only a head and shoulders shot, and feature no other individuals.The faces showed a variety of different facial expressions, and head orientations.However, the majority were facing towards the camera, with only slight deviations from the frontal orientation.
Each matching trial was constructed using a Facebook image and the video (comparison image).Same identity trials featured two different images of the same person.For different identity trials, the foils were selected on the basis of a multidimensional scaling analysis as part of a previous study (White et al., 2016).The experiment consisted of 94 trials.Each identity featured in both a same identity trial and a different identity trial.So that the same image never appeared twice, Facebook 1 was used for same identity trials, and Facebook 2 was used for different identity trials.All images presented in the experiment were the same height (300 pixels) and focal distance.
Each of the four conditions involved presenting different visual information for the comparison image.In the frontal condition, only the front of the face was shown.In the orientations condition, still images of the face were depicted sequentially from five viewpoints (frontal, left and right three quarter, and left and right profile), as shown in Figure 1.Each viewpoint was shown for a total of 500 ms.The face appeared to turn from one side to the other and then back again as the five different viewpoints were shown in sequence.In the moving condition, the participant saw a 4-s video clip that showed the face move fluidly from 0°to 180°(i.e., rotating from 0°through to 180°).In the interactive condition, the user could move the face from 0°to 180°using a computer mouse and pause the face in any angle they wished for any length of time desired.The programme recorded whether participants in the interactive condition moved the faces or not.
The participants completed the experiment online.The website was disabled on mobile phones/tablets.All participants completed the experiment on a desktop/laptop computer.

Procedure
In the invitation email, the participants were provided with a unique ID to use for the experiment.They gave permission that after the deadline for withdrawal had passed, we would then be able to match up their anonymized scores with their CFMT + scores.
When participants clicked on the link to complete the study, they were randomly allocated to one of the four conditions.Participants completed 94 trials, which were presented in a random order.In each trial, the Facebook image was always shown on the left, and the comparison image was shown on the right.Below the images, participants were asked, 'Are these the same people?'.They clicked either same or different to register their response.They were also asked, 'How confident are you in the accuracy of your response from 0% to 100%, with 0% being not confident at all, and 100% being absolutely confident.They selected from a drop-down menu of 11 possible responses (0, 10, 20, etc.).No time pressure was imposed.The faces remained visible until participants clicked 'Next' to proceed to the next trial.In the orientation and moving condition, the right-hand faces continued to move from side to side and back again for the duration of the trial.

Results
The data for the last (94th) trial in the frontal, orientation, and moving conditions did not save due to a programming error, so only data from the first 93 trials were analysed.Here, we report the results in brief.Supplementary information and analyses are presented in Appendix S1.

Accuracy
Data were analysed using multilevel logistic regression with accurate matches scored as 1 and inaccurate matches as 0 in a 4 (image type: frontal, orientations, moving, interactive) 9 2 (identity: same or different) factorial design.This analysis treated participants and the two face stimuli sets as fully crossed random factors using the R package lme4 (Bates, Maechler, Bolker, & Walker, 2015;R Core Team, 2018).These results are shown in Table 1.
The main effect of image type was significant, and there was an interaction between identity and image type.Figure 2 aids interpretation of this main effect and interaction, showing the means and 95% confidence intervals for accuracy in each of the eight conditions. 1verall accuracy in Experiment 1 was 90.1%, 95% CI [87.7,92.1].Overall accuracy in the frontal condition was 89.0%, 95% CI [86.1,91.4]; in the orientations condition, it was 89.9%, 95% CI [87.1,92.2]; in the moving condition, it was 91.2%, 95% CI [88.8,93.1];and in the interactive condition, it was 91.7%, 95% CI [89.1,93.6].Pairwise tests with a Hochberg correction (Hochberg, 1988) indicated that the frontal condition had lower average accuracy than the moving and interactive conditions (both p < .05),with the other pairwise comparisons non-significant.

Multilevel signal detection analysis
Figure 2 suggests qualitatively different performance for the interactive condition relative to the non-interactive frontal, orientations and moving conditions.Namely, participants appear to be biased towards making 'same' responses, but that this bias is reduced or reversed in the interactive condition.To investigate this possibility, we fitted a signal detection theory model as a multilevel probit regression for these data.In this model, we treat response (same or different) as the outcome and use identity and image type as predictors using lme4 with participant and the different face sets (i.e., comparison and Facebook images) as random factors (e.g., see Wright, Horry, & Skagerberg, 2009).Table 2 summarizes the criterion and sensitivity (d 0 ) estimates for each condition (obtained by transforming the probit regression coefficients).This approach also allowed us to estimate separate random effects for criterion and d 0 (reported in Appendix S1, Table A1).These show a clear pattern of differences both in criterion and in d 0 .The d 0 pattern largely follows that observed for accuracy shown in Figure 2, with a slightly more pronounced difference in sensitivity between the frontal and orientations conditions than the moving and interactive conditions.There is also an indication that the criterion shifts between the non-interactive and interactive conditions, with a higher estimate reflecting a more conservative decision standard.
Pairwise tests of the differences in criterion, with p adjusted using the Hochberg correction, indicate that the interactive condition had a higher threshold for responding 'same' than the other three conditions (all p < .001)with no other differences statistically significant (p > .05).Thus, typical participants in the interactive condition were more conservative in deciding matchesbeing more biased towards making 'different'  responses than in the non-interactive conditions.Additionally, the moving condition tended to have higher d 0 scores than either the frontal (adjusted one-sided p = .030)or orientations conditions (p = .108),as did the interactive condition (p = .024and p = .064,respectively).A post-hoc contrast comparing the average of the moving and interactive conditions with the static conditions also supported this interpretation, d 0 diff = 0.301, [0.094, 0.511].The moving and interactive conditions did not differ from each other (p > .05).

Confidence
The means and 95% CIs for each of the conditions are shown in Figure 3.

The relationship between confidence and accuracy
We ran separate analyses for the four comparison image conditions using the ordinal package in R (Christensen, 2011).Self-rated confidence was the dependent variable, and accuracy (% correct) was the predictor.Two models were compared, one included only intercepts, and the other added accuracy as a predictor.Accuracy predicted confidence in all four conditions: frontal (b = 1.1223,SE = 0.062, G 2 = 330.33,p < .

Discussion
Overall accuracy was high, exceeding 85% in each condition.There was a main effect of image type, suggesting that typical recognizers benefit from orientation information.This is likely to be because fluid orientation information supports the building of a view-invariant representation, making matching more accurate (Bruce et al., 1999;Hancock et al., 2000; Kramer & Reynolds, 2018).Indeed, performance in both the moving and interactive conditions was more accurate than the frontal condition.The pattern of performance in the frontal, orientations, and moving conditions is consistent with previous face matching literature showing that false alarms are the most common type of error (Davis & Valentine, 2009;Kemp et al., 1997).However, there was an interaction between identity and image type.In the interactive condition, accuracy was higher on different identity trials.The multilevel signal detection analysis revealed that typical recognizers were more likely to respond 'different identity' in the interactive condition, suggesting that interactivity may increase the salience of differences between facial images.
Confidence predicted accuracy in all the comparison image conditions, supporting previous findings that confidence is diagnostic of accuracy in face matching (Stephens et al., 2017).From an applied point of view, this is reassuring because identifications made with high confidence can have the greatest weight in criminal proceedings (Brewer & Burke, 2002;Cutler, Penrod, & Stuve, 1988;Lindsay, Wells, & Rumpel, 1981).

Experiment 2: Superior face recognizers
In Experiment 2, we tested superior face recognizers to investigate whether orientation information and interactivity affect face matching performance.If superior recognizers are particularly good at extracting structural information from static images, we would not expect either orientation information or interactivity to boost performance.As in Experiment 1, we were also interested in the nature of the relationship between confidence and accuracy.

Method
Apart from the following exceptions, the method was identical to Experiment 1.

Participants
Participants who had previously completed the Cambridge Face Memory Test: Extended (CFMT+) on www.superrecognisers.com, and scored 93 or more, were invited to participate via email.A total of 57 participants completed the experiment.In the interactive condition, nine participants were excluded because they did not move the comparison image in any of the trials.These data were never analysed.The final sample consisted of 48 participants (25 males, 23 females), with an age range of 18-68 years (M = 34.7,SD = 10).Their mean CFMT + score was 95.2 (SD = 1.7).

Results
As in Experiment 1, the data were not saved for the last trial due to a programming error in the frontal, orientations, and moving conditions.Supplementary information and analyses are available in Appendix S2.

Accuracy
Face matching accuracy was analysed using the same method as Experiment 1. Table 3 shows the likelihood chi-square statistic (G 2 ) and p-value associated with comparing individual effects (i.e., comparing a model without the effect to one including all effects of the same order).
The main effect of identity was significant, and there was an interaction between identity and image type.Figure 4 aids interpretation of the main effect of identity and the interaction between identity and image type, showing the means and 95% confidence intervals for accuracy in each of the eight conditions.

Multilevel signal detection analysis
Figure 4 suggests qualitatively different performance for the interactive condition and moving conditions relative to the frontal and orientations conditions.As in Experiment 1, this may partly reflect changes in bias and we therefore fitted a signal detection theory model as a multilevel probit regression for these data.We set up the model in the same way as in Experiment 1, treating response (same or different) as the outcome, identity, and  image type as predictors, and participant and the different face sets as random factors (and again obtained the estimates using brms because of difficulty estimating the Facebook image variance).Table 4 summarizes the criterion and d 0 estimates for each condition.These show a clear pattern of differences in criterion but less so for d 0 .Despite the appearance of differences in criterion or d 0 between conditions none of these differences reach statistical significance (all p > .05).

Confidence
The means and 95% CIs for each of the conditions are shown in Figure 5.

The relationship between confidence and accuracy
The relationship between confidence and accuracy was analysed using the same method as Experiment 1. Accuracy predicted confidence in all four conditions: frontal

Discussion
As in Experiment 1, overall accuracy was high (>90%) in each condition.There was a main effect of identity, with participants responding more accurately on different identity trials.This is opposite to the pattern observed for typical face recognizers, who were more accurate on same identity trials in the frontal, orientation, and moving conditions.The results fit with Bobak, Dowsett, and Bate (2016), who found that super-recognizers tended to be more conservative than controls.There was no main effect of image type, which may be because superior recognizers are better at structural encoding, and so unlike typical face recognizers, do not benefit as much from the additional orientation information (Bobak, Hancock, et al., 2016).However, there was an interaction between identity and image type: the difference between accuracy on same identity and different identity trials was reduced in the moving and interactive conditions.Whilst the condition means suggest that superior recognizers benefit from fluid movement in the sense that they are less likely to respond conservatively in these conditions, the multilevel signal detection analysis did not reveal any significant differences in criterion across conditions.This could be because overall high performance in Experiment 2 impacts on the ability to detect criterion shifts, or because overall there is less data in comparison with Experiment 1. Broadly speaking, the confidence-accuracy analyses replicate Experiment 1.There was a relationship between confidence and accuracy in each of the comparison image conditions.The superior recognizers exhibit numerically higher confidence than typical recognizers, which mirrors the pattern of accuracy.

Experiment 3: A comparison of typical and superior face recognizers
The results of Experiments 1 and 2 suggest that interactivity has the potential to shift patterns of performance in both typical and superior face recognizers, and may be extremely valuable in settings where it is important to avoid false positive matching decisions (Experiment 1).In Experiments 3 and 4, we compare the novel interactive procedure to the procedure associated with photo-ID, that is, matching to a frontal image.
The number of superior recognizers tested in Experiment 2 exceeds that of much previous research (Noyes et al., 2017).However, a proportion of participants in the interactive condition were excluded, which risked this condition being underpowered.In Experiment 3 we recruited a greater number of superior recognizers, comparing performance against typical recognizers in order to test whether the two groups process faces in qualitatively different ways, and exhibit different patterns of performance across frontal and interactive conditions.Experiments 3 and 4 were pre-registered (AsPredicted# 30321).

Method
Apart from the following exceptions, the method was identical to Experiment 1 and 2.

Design
This was a 2 9 2 9 2 mixed factorial design.The between-subjects factors were comparison image type (frontal or interactive) and recognizer (typical or superior).The within-subjects factor was identity (same or different).The dependent variables were accuracy, interactivity, and self-rated confidence.

Participants
Participants who had previously completed the CFMT + on www.superrecognisers.com were invited to participate via email.We did not send invitations to people who had previously taken part in Experiments 1 or 2. A total of 218 participants completed the experiment.In the interactive condition, 58 participants were excluded because they did not move the comparison image in any of the trials.These data were never analysed.The final sample consisted of 160 participants (104 female, 53 males, three prefer not to say) with an age range of 18-73 years, M = 43.9,SD = 19.6.There were 89 typicals (CFMT + score: M = 71.6,SD = 12.6) and 71 superiors (CFMT + score: M = 96.9,SD = 2.25).

Results
Supplementary information is presented in Appendix S3.

Accuracy
Table 5 shows the likelihood chi-square statistic (G2 ) and p-value associated with comparing individual effects (i.e., comparing a model without the effect to one including all effects of the same order).
The main effects of image type and recognizer were significant.There was a two-way interaction between identity and recognizer, and a three-way interaction between identity, image type and recognizer.Figure 6

Interactivity
Following programming improvements, we were able to record and analyse whether or not participants interacted on each trial in the interactive condition. 2Table 6 shows the There was a two-way interaction between identity and recognizer, showing that whilst typical recognizers interacted as much on same identity trials (58.7% [56.04, 61.63]) as different identity trials (58.3% [55.63, 60.97]), superior recognizers interacted more on same identity trials (57.8% [54.85, 60.75] than different identity trials (51.9% [48.92, 54.88]).
We tested whether average accuracy for each trial in the frontal condition (Experiment 3) predicted interactivity in the interactive condition.Average accuracy in the frontal condition provided a metric that was uncontaminated by whether participants interacted.Both typical and superior recognizers were more likely to interact on difficult trials (typical: b = À2.545,SE = 0.637, G 2 = 14.94, p < .001;superior: b = À3.411,SE = .552,G 2 = 33.73,p < .001).

Multilevel signal detection analysis
As in Experiments 1 and 2 multilevel probit regression was used to fit a signal detection model for responses to same versus different targets.Table 7 shows the estimates of criterion and d 0 for the frontal and interactive conditions by group.
The effects of condition and group and their interaction on criterion and d 0 were tested using contrasts.Criterion was higher on average for superior recognizers than typical

Confidence
The means and 95% CI for each of the conditions are shown in Figure 7.

The relationship between confidence and accuracy
The relationship between confidence and accuracy was analysed using the same method as Experiments 1 and 2. Accuracy predicted confidence in both of the image conditions for typical recognizers: frontal (b = 1.202,SE = 0.068, G 2 = 314.32,p < .001),and

Discussion
In line with the results of Experiments 1 and 2, accuracy was high, exceeding 80% in all conditions.The overall pattern of results replicates and clarifies our previous findings.
There was a main effect of recognizer.As expected, the superior recognizers responded more accurately than typical recognizers, and being more conservative, superiors were particularly accurate on different identity trials (Bobak, Dowsett, et al., 2016).There was also a main effect of image condition, with higher accuracy for interactive images.The results of Experiment 3 show that superior recognizers do benefit from interactivity.The failure to detect an overall benefit of interactivity in Experiment 2 is therefore unlikely to have been due to ceiling effects as the same stimuli were used across both experiments.
Changes in performance across image conditions were not due to changes in criterion; the pattern of d 0 estimates reflects higher sensitivity in the interactive condition.The detection advantage did not vary by group, indicating that the three-way interaction in terms of accuracy is a product of change in bias.As in Experiment 1, typical recognizers respond more conservatively in the interactive condition.
The pattern of responses for typical recognizers was the same as in Experiment 1, despite differences in the mean CFMT + score.In Experiment 1, the mean CFMT + score (76.34) was relatively high.In Experiment 3, the mean score of 71.56 sat at the bottom of the range (around 70-75) observed in other studies (Bobak, Pampoulov, et al., 2016;Russell et al., 2012).
The confidence-accuracy results replicate Experiments 1 and 2, showing a strong relationship in each condition.
However, having observed high levels of overall accuracy in Experiments 1, 2, and 3, we cannot rule out the possibility that ceiling effects mask the true magnitude of the interactivity effect, particularly as participants tended to interact more on difficult trials.Experiment 4 addressed this issue.

Experiment 4: A comparison of typical and superior recognizers matching pixelated images
In Experiment 4, the testing conditions were designed to reflect potential challenges encountered in forensic contexts.The police often use low resolution (pixelated) images from CCTV footage to identify suspects, comparing these against a database of highquality images.Pixelation reliably reduces accuracy in unfamiliar face matching (Bindemann, Attard, Leach, & Johnston, 2013;Ritchie et al., 2018).In this experiment, the Facebook image was degraded by pixelation, so we expected accuracy to be lower than in Experiments 1, 2, and 3. We also expected superiors to outperform typicals, and for performance to be most accurate in the interactive condition.

Method
Apart from the following exceptions, the method was identical to Experiment 3.

Participants
Participants who had previously completed the CFMT + on www.superrecognisers.com were invited to participate via email.We did not send invitations to people who had previously taken part in Experiments 1, 2, or 3. A total of 253 participants completed the experiment.In the interactive condition, 56 participants were excluded because they did not move the comparison image in any of the trials.These data were never analysed.The final sample consisted of 197 participants (128 females, 68 males, 1 prefer not to say) with an age range of 21-86 years, M = 43.3,SD = 16.1.There were 104 superiors (CFMT + score: M = 96.5,SD = 2.6) and 93 typicals (CFMT + score: M = 75.8,SD = 12.1).

Apparatus and materials
The Facebook images were pixelated using the Mosaic function in Adobe Photoshop 2020, which converts pixels into weighted averages.Each 6 9 6 pixel square in the image was transformed into a sub-sampled block of equal luminance.Before pixelating, each image had a horizontal resolution of 300 pixels.After pixelating, each image had a horizontal resolution of 50 pixels.

Results
Supplementary information is presented in Appendix S4.

Accuracy
Table 8 shows the likelihood chi-square statistic (G 2 ) and p value associated with comparing individual effects (i.e., comparing a model without the effect to one including all effects of the same order).
The main effects of image type and recognizer were significant.There was a two-way interaction between identity and image type, and a two-way interaction between identity and recognizer.Figure 8 aids interpretation of these effects, showing the means and 95% confidence intervals for accuracy in each of the eight conditions.

Interactivity
Table 9 shows the likelihood chi-square statistic (G 2 ) and p-value associated with comparing individual effects.
As in Experiment 3, we tested whether average accuracy for each trial in the frontal condition (Experiment 4) predicted interactivity in the interactive condition.Typical recognizers were no more likely to interact on more difficult trials (b = 0.061, SE = 0.796, G 2 = 0.01, p = 0.940), and did we detect an effect for superior recognizers (b = À1.443,SE = 1.112,G 2 = 1.68, p = 0.194).

Multilevel signal detection analysis
As in the earlier experiments, a multilevel probit regression was used to obtain estimates of criterion and d 0 for each condition.Table 10 shows the estimates of criterion and d 0 for the frontal and interactive conditions by group.
There was a main effect of criterion which, although lower overall than in Experiment 3, remained higher for superior recognizers than typical recognizers, c diff = 0.423 [0.263, 0.580].However, there was little evidence of a difference in criterion between conditions,

Confidence
The means and 95% CIs for each of the conditions are shown in Figure 9.

The relationship between confidence and accuracy
The relationship between confidence and accuracy was analysed using the same method as previous experiments.Accuracy predicted confidence in both of the image conditions

Discussion
In Experiment 4, one image in each pair was pixelated in order to eliminate ceiling effects.Average accuracy was lower than in Experiment 3.For typical recognizers, the error rate was around 20%, sitting in the middle of the range typically observed in laboratory-based experiments (Bruce et al., 1999;Megreya & Burton, 2006, 2008).As in Experiment 3, superiors were more accurate than typical recognizers, and they responded more conservatively, performing particularly accurately on different identity trials.
Overall accuracy was higher in the interactive condition than the frontal condition.Whilst ceiling effects may have operated in the previous experiments, the results of Experiment 4 provide reassurance that this did not mask the magnitude of the interactivity effect.Based on the pattern of d 0 estimates, it appears that degrading the stimuli and making the task harder reduced the benefit afforded by the interactive condition.The effects are smaller in magnitude, rather than larger.
The two-way interaction between identity and image type reflects an overall advantage for different identity trials, with the advantage less pronounced in the interactive condition.In Experiments 1 and 3, typical recognizers were more likely to respond 'same' when both images were static and high quality.In Experiment 4, pixelating one image likely magnified apparent differences between faces.As a result, same identity trials became more difficult, and accuracy was higher on different identity trials.This pattern is consistent with the results of previous face matching studies.Bindemann et al. (2013) observed a more dramatic drop in performance on same identity trials compared to different identity trials when one image was pixelated.The data we present in Experiment 4 suggest that interactivity may mitigate this effect, supporting performance on same identity trials for both typicals and superiors.
In Experiment 3, participants interacted more on difficult trials.In Experiment 4, both groups of recognizer interacted more on different identity trials despite same identity trials being more difficult.Pixelating one image may have disrupted their assessment of difficulty, preventing optimal use of the system.
As expected, there was a relationship between confidence and accuracy in all conditions.However, the relationship was not as strong as when both images were high quality (Experiments 1, 2, and 3).Indeed, the relationship was weaker for typicals compared to superior recognizers.

General discussion
Across four experiments, we have presented strong evidence that fluid orientation information and interactivity boosts face matching performance.It supports performance across the spectrum of face recognition ability, and across different image qualities.The findings have important security implications, underlining the forensic utility of interactivity for identity verification.Any significant difference, even if the effect sizes are small, has the potential to be meaningful in an applied context.A single fraudulently obtained passport provides the opportunity to open bank accounts, take out loans, or apply for mortgages.Indeed, criminals using fraudulently obtained travel documents are likely to have convictions for serious crimes (Harper, 2016).

Typical and super-recognizers
The findings are important from a theoretical point of view, contributing to the debate about possible differences in the way typical and super-recognizers process faces (e.g., Bate et al., 2019;Bobak, Bennetts, et al., 2016;Bobak, Hancock, et al., 2016;Bobak, Parris, Gregory, Bennetts, & Bate, 2017;Robertson et al., 2019;Russell et al., 2009).Our data do not fully support the hypothesis put forward by Bobak, Hancock, et al. (2016) that superrecognizers are better than typical recognizers at structural encoding and can construct view-independent representations from static images.Both types of recognizer benefitted from additional viewpoint information provided by the interactive image, suggesting at least some reliance on pictorial encoding strategies.Our findings support those of Bate et al. (2019), who argue that super-recognizers simply sit at the extreme of the face recognition spectrum.
This does not mean that typical and super-recognizers exhibit identical patterns of performance.When comparing two high-quality images, typical and superior recognizers differ in terms of criterion placement (Experiments 1, 2, and 3; see also Bobak, Dowsett, et al., 2016).By default (i.e., without the benefit of fluid orientation information or interactivity), it would seem that typical recognizers focus on between-image similarities and look for evidence that the two faces depict the same person, whereas superior recognizers focus on between-image differences.This would explain higher accuracy on same identity trials for typical recognizers (Experiments 1 and 3), and higher overall accuracy on different identity trials for superior recognizers (Experiment 2, 3, and 4).With a greater amount of facial information available and the ability to self-select which information to use, interactivity seems to shift the focus for typical recognizers and highlights differences, making them both more accurate, and more conservative.
Crucially though, it cannot be argued that the value of interactivity mainly lies in highlighting differences between images.In Experiment 4, the degraded image quality affected both types of recognizer similarly by increasing the salience of differences and resulting in higher accuracy on different compared to same identity trials.Interactivity mitigated this effect to some extent for both typicals and superiors, driving same identity performance up towards different identity performance.
Using the interactive procedure for identity verification Super-recognizers are known to outperform typical recognizers on face matching tasks (Belanova et al., 2018;Bobak, Dowsett, et al., 2016;Bobak, Hancock, et al., 2016), and their skills are sought after in forensic contexts.An innovation that boosts superior recognizer performance when comparing high-quality (Experiment 3) and mismatched quality (Experiment 4) images is therefore important.We are not aware of other studies that have provided specific evidence that super-recognizer performance can be optimized in such a way.Ritchie et al. (2018) present a method of overcoming the deleterious effect of pixelation by creating an average of several poor-quality images to be compared to a high-quality image, but they do not compare performance across typical and superrecognizers.The success of Ritchie et al.'s (2018) method in the field depends on there being several poor-quality images available.Whilst both interactivity and averaging likely work by increasing the amount of visual information available and enabling the operator to reduce the contribution of within-person variability as a source of error (Jenkins, White, Van Montfort, & Burton, 2011;Ritchie et al., 2018), one benefit of interactivity is that it can be used when the police only possess a single poor-quality image of the suspect to be compared to the interactive face.
A further benefit of the procedure applies to typical recognizers.In Experiment 1 and 3, interactivity supported the performance of typical recognizers on different identity trials.The utility of interactivity is underlined when we consider that most ID verification tasks involve same identity trials.Accurate performance on different identity trials is therefore crucial for preventing identity fraud.

The participant sample
It is important to address points about the samples used in this study.Firstly, the participants were invited to take part via www.superrecognisers.com and are likely to have been highly motivated.They took part in initial studies because of their interest in super-recognition, and agreed to be contacted about future studies.We cannot rule out the possibility that typical and superior recognizers differed more in terms of motivation than natural ability (see Noyes et al., 2017).On the other hand, it has been shown that differences in incentive-based motivation between groups do not affect scores (Bobak, Dowsett, et al., 2016).
All participants had previously received their results on the CFMT+, Glasgow Face Matching Task, and a short-term face memory test.Whilst they were not explicitly told whether they were super-recognizers, they were told whether their scores fell within the top 5, 10, 25, or 50% of participants.We do not believe that this affected the results because in the frontal condition (Experiments 1, 2, and 3) both groups behaved in a way that was consistent with previous literature.Superior recognizers were both more accurate and more conservative than typical recognizers (Bobak, Dowsett, et al., 2016), and typical recognizers were more likely to commit errors on different identity trials (Davis & Valentine, 2009;Kemp et al., 1997).
The superior recognizers in this study were people scoring 93 or more on the CFMT+.Whilst we acknowledge that Bobak, Pampoulov, et al. (2016) recommend that a score of 95 should be used as a cut-off for super-recognition, Belanova et al. (2018) have found no difference in outcomes on a series of tests when comparing participants who scored 93/ 94 to those scoring over 95.We have referred to our participants as 'superior recognizers' rather than super-recognizers because the latter term tends to be reserved for people who have undertaken a series of neuropsychological tests.Nevertheless, the mean CFMT + scores in Experiments 2, 3, and 4, range between 95.19 and 96.50, and so are similar to the means in previous super-recognizer studies (95.7, Bobak, Hancock, et al., 2016;97.7, Bobak, Dowsett, et al., 2016).Related to the above points about the CFMT + is the potential for measurement error when trying to capture general face recognition and identification ability using existing standardized tests.Such tests do not always predict performance on less standardized tests (Balsdon, Summersby, Kemp, & White, 2018), and there are calls for existing screening protocols for super-recognition to be expanded (Bate et al., 2018).However, whilst the CFMT + is unlikely to enable us to perfectly distinguish typical from super-recognizers, individual differences in ability, test-specific strategies, and within-person differences in attention (e.g., distraction, fatigue) may play a role in explaining at least some of this measurement error.For our purposes, we are confident that the CFMT + provides a satisfactory way of discriminating between groups of typical and superior face recognizers (Bobak, Pampoulov, et al., 2016).

Future directions
Our results underline the importance of 3D view-independent representations in face matching.As the Facebook images were profile images, the vast majority show people facing towards the camera, only slightly (if at all) offset from centre.The results may have revealed a bigger benefit for the moving/interactive conditions if the Facebook images had varied more in terms of orientation.We would expect performance in the frontal condition to particularly suffer (Bruce et al. 1999;Hancock et al., 2000), but performance in the interactive condition to benefit, boosted by the participants' ability to minimize withinperson variability across images and to carefully compare faces at the same orientation.
We cannot be sure how these laboratory-based findings might translate into specific applied contexts.Whilst the effect may be attenuated in the field, it is equally possible that it might be amplified owing to higher levels of motivation (Moore & Johnston, 2013), and knowledge of incorrect response implications.Future research should test the procedure in the field, and across the full range of image types encountered in forensic and security settings (e.g., greyscale, blurred, or partially occluded faces).

Conclusion
In this paper, we tested typical and superior recognizers using a novel interactive face matching procedure.In contrast to standard (i.e., static frontal) one-to-one face matching tasks, the procedure provides fluid orientation information, and the opportunity to interact with the comparison facial image by manoeuvring it into different orientations.This easy-to-implement procedure has a range of applied benefits: It optimizes the performance of both typical and superior recognizers, and has the potential to highlight both similarities and differences between facial images.The results support the hypothesis that typical and superior face recognizers process faces in qualitatively similar ways: Reliance on pictorial encoding when viewing static images helps to explain the benefit of the interactive procedure.

Figure 1 .
Figure 1.A selection of frames extracted from the studio video.Note.The images show side, ¾ and frontal orientations on both left and right sides.

Figure 2 .
Figure 2. Face matching accuracy for frontal, orientations, moving and interactive conditions, Experiment 1. Note.Error bars show 95% CIs for the condition means.

Figure 3 .
Figure 3. Self-rated confidence following face matching decisions, Experiment 1. Note.Error bars show 95% CIs for the condition means (calculated from the SE).

Figure 4 .
Figure 4. Face matching accuracy for frontal, orientations, moving and interactive conditions, Experiment 2. Note.Error bars show 95% CIs for the condition means.

Figure 5 .
Figure 5. Self-rated confidence following face matching decisions, Experiment 2. Note.Error bars show 95% CIs for the condition means (calculated from the SE).
aids interpretation of these effects, showing the means and 95% confidence intervals for accuracy in each of the eight conditions.Overall accuracy in Experiment 3 was 90.6% [88.1, 92.6].For typical recognizers in the frontal condition, it was 85.8% [82.1, 88.8], and in the interactive condition, it was 89.3%[85.8,92.0].For superior recognizers in the frontal condition, it was 93.4% [91.3, 95.0], and in the interactive condition, it was 94.9% [93.0, 96.4].

Figure 6 .
Figure 6.Face matching accuracy for typicals and superiors in the frontal and interactive conditions, Experiment 3. Note.Error bars show 95% CIs for the condition means.

Figure 7 .
Figure 7. Self-rated confidence following face matching decisions, Experiment 3. Note.Error bars show 95% CI for the condition means (calculated from the SE).

Figure 8 .
Figure 8. Face matching accuracy for typicals and superiors in the frontal and interactive conditions, Experiment 4. Note.Error bars show 95% CIs for the condition means.

Figure 9 .
Figure 9. Self-rated confidence following face matching decisions, Experiment 4. Note.Error bars show 95% CIs for the condition means (calculated from the SE).

Table 1 .
Summary of likelihood tests for the 2 9 4 factorial analysis, Experiment 1

Table 2 .
Multilevel signal detection analysis: estimates of criterion and D Prime (d 0 ), Experiment 1

Table 3 .
Summary of likelihood tests for the 2 9 4 factorial analysis, Experiment 2

Table 4 .
Multilevel signal detection analysis: estimates of criterion and D Prime (d 0 ), Experiment 2

Table 6 .
Summary of likelihood tests for the 2 9 2 factorial analysis of interactivity, Experiment 3 There was also a difference in criterion between image conditions, c diff = 0.224 [0.024, 0.426], but no interaction between group and image condition, c diff = 0.242 [À0.698, 0.239].For d 0 , superior recognizers were better at discriminating matches from non-matches than the typical group d 0 diff = 0.997 [0.742, 1.257].The interactive condition also had higher d 0 scores than the frontal condition,

Table 7 .
Multilevel signal detection analysis: estimates of criterion and D Prime (d 0 ), Experiment 3

Table 9 .
Summary of likelihood tests for the 2 9 2 factorial analysis of interactivity, Experiment 4 diff = 0.035[À0.120,0.194],or of an interaction between group and condition, c diff = 0.074 [À0.297, 0.451].For d 0 there were main effects of group and condition.Superior recognizers were better at detecting matches versus non-matches than the typical group d 0 diff = 0.676 [0.499, 0.848].The interactive condition also had higher d 0 scores than the frontal condition, d 0 diff = 0.192 [0.021, 0.369], and there was also tentative evidence of a greater effect of the interactive condition for the superior recognizers, d 0 diff = 0.174 [À0.172, 0.513].It is worth noting that the advantage for superior recognizers and for the interactive condition shows the same general pattern as Experiment 3, but with smaller effect sizes. c

Table 10 .
Multilevel signal detection analysis: estimates of criterion and D prime (d 0 ), Experiment 4