Let's talk about faces: Identifying faces from verbal descriptions

Abstract Face descriptions inform real‐world identification decisions, for example when eyewitnesses describe criminal perpetrators. However, it is unclear how effective face descriptions are for identification. Here, we examined the accuracy of face identification from verbal descriptions, and how individual differences in face perception relate to producing and using descriptions for identification. In Study 1, participants completed a face communication task in pairs. Each participant saw a single face, and via verbal communication only, the pair decided if they were viewing the same person or different people. Dyads achieved 72% accuracy, compared to 81% when participants completed the task individually by matching face pairs side‐by‐side. Performance on the face communication and perceptual matching tasks were uncorrelated, perhaps due to low measurement reliability of the face communication task. In subsequent studies, we examined the abilities of face ‘describers’ (Study 2) and ‘identifiers’ separately (Study 3). We found that ‘super‐recognizers’ – people with extremely high perceptual face identification abilities – outperformed controls in both studies. Overall, these results show that people can successfully describe faces for identification. Preliminary evidence suggests that this ability – and the ability use facial descriptions for identification – has some association with perceptual face identification skill.


INTRODUCTION
Describing faces and identifying faces from descriptions are important tasks in social interactions and applied settings. We may describe someone's facial appearance to a friend when trying to ascertain if we are talking about the same person (e.g., 'she has green eyes and a big smile'). Face descriptions can also be important in criminal investigations. For example, a witness may be asked to describe an offender to police to construct a likeness using a composite system or sketch, or a description may be transmitted to officers in the field via radio.
Despite being an important and relatively common task, little is known about how people describe faces for identification purposes. There have been some rare attempts to examine free descriptions of faces in studies regarding social attributions made to faces (e.g., Oosterhof & Todorov, 2008), but studies on the description of face identity have mostly been concerned with the content of facial descriptions made by eyewitnesses. Overwhelmingly, eyewitnesses report very few details about offenders' faces and their descriptions of facial information tend to be highly error-prone -indicating facial descriptions are of poor quality for recognition purposes (Sporer, 1992;Van Koppen & Lochun, 1997). However, the sparseness and inaccuracies in eyewitness facial descriptions may be due to memory constraints, as the nature of the task involves a delay between initial viewing of the face, generating a description, and completing a subsequent face recognition test (Lindsay et al., 2011). The eyewitness literature is therefore limited in offering insights into whether the process of generating facial descriptions for identification, independent of memory demands, is error-prone in and of itself. In addition, research in the domain of eyewitness recognition does not address how useful face descriptions are in circumstances when the person making the identification is a different person to the initial describer (Meissner et al., 2007). We therefore know little about how effectively individuals can communicate facial information to others.
In contrast, there is an abundance of research on people's ability to identify faces from perceptual information where there is no requirement to describe the face. Unfamiliar face-matching decisions, for example when matching a photograph of a suspect to CCTV footage, are highly error-prone, even when both images are available for perceptual comparison (Bruce et al., 2001;Burton & Jenkins, 2011;Hancock et al., 2000). The error rate in face identification decisions observed in purely perceptual conditions constrains the upper limit of accuracy we could expect to see in situations that involve both perceptual and verbal demands.
As might be expected given the relatively poor performance in tests of perceptual face identification, the few studies that have examined people's ability to identify faces from descriptions suggest this is a very difficult task. In one study, 'identifiers' were given facial descriptions generated by participants in a previous study and asked to identify which face, out of five choices, the description pertained to (Fallshore & Schooler, 1995). Relying solely on this facial description, identifier participants selected the correct face at accuracy levels barely above chance (27%). More recently, the utility of face descriptions for identification purposes was investigated in a live matching context (Kramer & Gous, 2020). Here, 'describers' generated descriptions of faces in real-time to 'identifiers', who used the description to identify the relevant person from a 10-person line-up. When matching a facial description to a different image than that viewed by the describer, identifiers again achieved barely above chance accuracy (23%). These studies offer preliminary evidence that individuals can relay the identity of a face via description, albeit with strikingly low levels of accuracy.
However, all existing studies in the domain of face communication have constrained the exchange of information between individuals to be unidirectional, from describer to identifier (Fallshore & Schooler, 1995;Kramer & Gous, 2020). This prevents reciprocal dialogue between the describer and identifier, for example where an identifier might seek clarification about aspects of the description which were unclear or ambiguous. This is a key limitation in our understanding of face communication because in the real world many face identification decisions involve reciprocal verbal communication in real-time (e.g., when police radio a suspect description to another officer in the field, who is tasked with apprehending the suspect based on the description).
Although the nature of face communication ability is not well understood, based on other domains, there are reasons to suspect that those with greater perceptual face expertise will also exhibit semantic expertise when describing faces. For example, recent research has demonstrated that individuals who are literate have better object recognition abilities, including for faces, than those who are illiterate, and this effect appears to be driven specifically by learning to read (Van Paridon et al., 2021). These results suggest that the ability to comprehend and express oneself through written language assists with fine-tuning object recognition skills, although it is unclear to what extent domain-specific literacy for objects of recognition contributes to these effects. Other evidence for a relationship between perceptual and verbal expertise is found in wine tasting, where through training, individuals can enhance both their verbal/ conceptual knowledge about wine as well as their olfactory detection thresholds and discrimination (Block & Beckett, 1990;Spence & Wang, 2019). Wine experts are also more accurate at matching wines to written descriptions than novices (Hughson & Boakes, 2009;Solomon, 1990). This evidence suggests that those with greater perceptual expertise also tend to exhibit semantic expertise within that same domain. However, wine descriptions are often not literal and instead rely on analogy (e.g., flamboyant, toasty, velvety), while face descriptions can be literal (e.g., 'blue eyes'). Critically, the goals of subjective (e.g., analogy-based) and objective (fact-based) description are very different -the former is focused on creating an impression or vision, while the latter is focused on accuracy (Connelly, 2012). Thus, although tasting and describing wines are skills that appear to be closely related, it is not clear whether our perceptual expertise with faces is so closely tied to verbal communication.
Here we test how individual differences in perceptual ability with faces is related to the ability to describe and interpret verbal descriptions of faces. In recent years, it has become clear that there are large differences in people's ability to identify faces. While some people struggle to recognize their closest friends and relatives (prosopagnosia; Duchaine & Nakayama, 2006), others can easily recognize the most trivial of acquaintances even years later (super-recognizers; Russell et al., 2009). These groups appear to represent the ends of a distribution of ability that varies dimensionally in the population, and this variation is known to be heritable, stable over time and largely unaffected by training/experience (Balsdon et al., 2018;DeGutis et al., 2014;Towler et al., 2019Towler et al., , 2020Wilmer et al., 2010). However, the cognitive parameters of individual differences in face identification and the extent to which they generalize beyond the primary visual system have been little studied.
Whether individual differences in face identification ability extend to verbal abilities in describing faces is not known, however verbal communication does appear to be associated with identification accuracy in some circumstances. For example, communication between individuals working together to make identity decisions has been found to enhance performance in perceptual face-matching (Dowsett & Burton, 2015). However, benefits to the accuracy of joint decisions were only conferred by people with high levels of face-matching ability to those with lower ability. This finding suggests that individuals with strong abilities in matching faces can communicate the basis for their decisions effectively and raises the possibility that they are also better able to describe faces. However, the content of participant discussions leading to this communication benefit is not well understood (Ritchie et al., 2022). Additionally, facial forensic examiners are known to outperform standard participant groups in face identification ability, and unlike other high-performers in face identification (i.e., super-recognizers), they are required to support face identification decisions with verbal justifications and specialized face vocabulary (White et al., 2015). However, the aetiology of forensic examiners perceptual expertise and its relationship to their verbal expertise for faces is not well understood. Consequently, understanding the relationship between individual differences in perceptual and verbal expertise for faces is important for theoretical and practical reasons.
In three studies, we measured the accuracy of identification decisions based on facial descriptions. We aimed to improve understanding of the factors influencing this accuracy; in particular, how description generation and comprehension accuracy are associated with an individual's perceptual face identification abilities. In our first study, dyads of participants communicated about faces in real-time to make perceptual matching decisions, where each participant viewed one image of the image pair. This allowed us to investigate people's ability to identify faces when using natural interactive dialogue. However, this approach did not enable us to examine the separate contributions of face 'describers' (i.e., generating descriptions) and face 'identifiers' (i.e., identifying based on descriptions). In two subsequent studies, we therefore investigated the contribution of individual differences in perceptual ability for faces towards verbal-based identification accuracy in face 'describers' (Study 2) and 'identifiers' (Study 2 & 3) separately.

STUDY 1
In Study 1, we examined three initial questions. First, we measured the accuracy with which dyads of participants could make face-matching decisions by utilizing natural two-way conversation to exchange verbal information about faces. Dyad performance was compared to individual performance on the same task in standard perceptual discrimination conditions. Second, we examined whether individual differences in perceptual matching ability were associated with the accuracy of dyads in verbally based identification decisions by measuring the correlation across perceptual and verbal versions of the task, and an existing perceptual test with strong psychometric properties. Third, we examined the content of participants' descriptions of facial information to capture the way that people describe faces for the purpose of identification and to test whether certain types of description are associated with higher accuracy.

Participants
A total of 102 first-year University of New South Wales (UNSW) Psychology students (M age = 19.67 years, 68.63% female, 63.73% Caucasian) participated in the study in exchange for course credit. We advertised two openings per timeslot, so that participants were automatically paired with another person in their timeslot. All participants were Native English speakers and had normal or corrected-to-normal vision. We required participants in all three studies reported here to speak English as their first language because evidence suggests non-native speakers are more likely to make linguistic errors, particularly syntax-related, and have different interpretations of colloquial speech patterns (e.g., idioms, slang) when communicating in their non-native language (Mäntylä, 2004;Marina & Snuviškiene, 2005). All studies reported here received ethics approval from UNSW.

Glasgow Face-Matching Test (GFMT)
The GFMT (Burton et al., 2010) is a standardized measure of face-matching ability and consists of 40 pairs of studio quality black and white photos of Caucasian faces (male and female), in front-on view. Based on prior testing (Towler et al., 2014), the standard 40-item version of the GFMT was sub-divided into two, 20-item sub-tests of equal difficulty, each with 10 matching pairs (two different images of the same person) and 10 non-matching pairs (two images of different people). For each dyad of participants, one of the sub-tests was randomly selected to be used as a measure of face-matching ability ('Perceptual GFMT'; see Figure 1a), and the other was used to measure how well participants can communicate about faces to make a joint identification decision ('Verbal GFMT'; see Figure 1b). The order of items in both versions was pseudo-randomized to minimize order effects.
The Perceptual GFMT was administered individually to each participant in the same format as the original GFMT. On each trial, each participant viewed two facial images simultaneously and decided whether the images were of the same person or two different people. For the Verbal GFMT, each member of the dyad was shown only one of the two images from the trial of the GFMT. Participants could not see their partner's image. Through discussion only, the dyad sought to determine whether the two images were of the same person or two different people, and one of the participants recorded their collective response.

Expertise in facial comparison test (EFCT)
The EFCT (White et al., 2015) is a simultaneous face-matching test that was designed to mimic the type of task forensic face examiners perform in the course of their work. The EFCT was used to obtain an additional measure of face identification ability. Two versions of the EFCT, known to be equivalent in difficulty based on the original test development, were used as a pre-test and post-test (i.e., before and after the Verbal GFMT) of individuals' face-matching accuracy. We included a pre-and post-test of face identification ability as prior research observed an improvement in performance when participants collaborated on face identity decisions in pairs, but only for individuals who were poor at a baseline test of perceptual matching (Dowsett & Burton, 2015). We therefore wanted to examine if the experience of having to describe faces to each other would be sufficient to elicit improvement in a subsequent perceptual matching task.
In each EFCT version, across 84 matching trials, participants decided whether two simultaneously presented images depicted the same person or different people, and both images remained on screen until a response was made (see Figure 1c). Accuracy was measured using area under the receiver operating characteristic curve (AUC). The order of use of the two versions of the EFCT was kept constant within participant dyads (i.e., Participant A and Participant B completed the same versions as each other at pre-test and at post-test) but counterbalanced between dyads.
Procedure Participant A and Participant B were each seated at a computer with a screen separating them to prevent non-verbal communication. Participants first worked independently at their own computers to complete the EFCT pre-test and the Perceptual GFMT. Once both participants had completed the Perceptual F I G U R E 1 (a) Perceptual GFMT: example trial. In this example, the two images are of the same person. (b) Verbal GFMT: visual representation of the study procedure. Participant A and Participant B were seated opposite each other, separated by a fabric screen, so that they could hear but not see each other. On their respective computer screens, each participant saw an image of a face. In this example, the two images are of the same person. (c) EFCT: example trial. In this example, the two images are of the same person GFMT, the experimenter provided instructions for the Verbal GFMT. Participants were told to read each trial number aloud to ensure they were comparing the correct images, and then to take it in turns to describe their images to each other so they could reach a joint identification decision. They were told that their conversations would be audio-recorded, to allow for coding of their verbal responses. After completing the 20 trials of the Verbal GFMT, participants worked independently to complete the post-test EFCT.
Most participants took 45-60 min to complete all tasks. Three dyads did not complete the EFCT post-test due to time constraints (a maximum of 60 min was allowed for the study). Due to technical difficulties, one dyad was not recorded while undertaking the Verbal GFMT. These four dyads were included in analyses, excluding those that their missing data affected. All statistical analyses reported here were completed in IBM SPSS Statistics Software.

Perceptual and verbal face identification accuracy
As the first step in our analysis, we wanted to examine the accuracy with which participants can make face-matching decisions based on collaborative verbal communication, and to compare this level of performance to that achieved in the standard GFMT when participants could visually inspect and compare the pair of images. Accuracy on the Verbal GFMT and Perceptual GFMT are shown in Figure 2. Participants' overall accuracy on the Perceptual GFMT and the tendency for higher accuracy in match trials of this test are consistent with published normative data (Burton et al., 2010).
Mean accuracy on the Verbal GFMT (M = 72.5%, SD = 10.9) was substantially above chance, but significantly lower than mean accuracy on the Perceptual GFMT (M = 81.3%, SD = 9. To examine if communicating about face identity with another person impacted accuracy on subsequent perceptual judgements, we compared performance on the EFCT subtests taken before and after DESCRIBING FACE IDENTITY 267 F I G U R E 2 Dyads' performance on the verbal GFMT and perceptual GFMT for all trials, match trials only, and non-match trials only. The solid line on each violin plot represents the median accuracy. The area between the dotted lines on each violent plot represents the interquartile range the verbal GFMT. Because prior work has shown that interventions can differentially affect high-and low-performing participants (Dowsett & Burton, 2015;Towler et al., 2021;White et al., 2015), we examined here whether the difference in performance on the EFCTs completed before and after the Verbal GFMT was affected by participants' face identification ability as measured by an independent test (i.e., the Perceptual GFMT). Participants were separated into low and high performers on the Perceptual GFMT based on a median split. A two-way ANOVA with factors Perceptual GFMT Score (low, high) and EFCT Test Phase (pre, post) showed a significant main effect of EFCT Test Phase, F(1, 81) = 4.08, p = .047, η 2 = .008. This reflected a fall in accuracy for the EFCT completed after the Verbal GFMT, with performance on the EFCT at pre-test (M AUC = .83, SD = .06) significantly higher than performance at post-test (M AUC = .81, SD = .05). However, the interaction was not significant, F(1, 81) = .94, p = .335, η 2 = .002, suggesting both high and low performers experienced an accuracy decrement on the EFCT completed after the Verbal GFMT. Thus, we found no evidence that verbally communicating about faces leads to an improvement in perceptual face-matching performance, and some evidence that it impaired accuracy.

Individual differences in perceptual and verbal face identification
We next asked whether individual participants' perceptual matching ability was predictive of the Verbal GFMT score achieved by the dyad that the participant contributed to. All correlational analyses are reported using Spearman's rho (r s ). As shown in Figure 3, we found large variation in performance for both perceptual and verbal versions of the GFMT. Despite these large differences, we found no significant relationship between performance on the Verbal GFMT and mean performance of the pair on the Perceptual GFMT, r s (49) = .115, p = .420, or on the EFCT, r s (49) = .108, p = .450. We found a similar pattern of results when examining the relationship between dyads' Verbal GFMT scores and Perceptual GFMT scores for the best and worst-performing dyad members separately (see Supporting Information).

Content of facial descriptions for identification purposes
We also examined the content of participants' facial descriptions. Independent raters listened to Verbal GFMT audio recordings in order to code which facial features were mentioned in conversation. For each Verbal GFMT trial completed by each dyad, raters made a binary judgement as to whether facial features were mentioned or not (features coded for were: eyebrows, eyes, mouth, hair/hairline, nose, facial marks, face shape, forehead, ears, chin, facial hair, cheeks, and jawline). Initial inter-rater agreement (calculated using the percentage of absolute agreement; Altman, 1990;Chaturvedi & Shweta, 2015) for all recordings was above 85%, and all disagreements were resolved by an independent third rater. The results are presented in Figure 4. For each trial, participants discussed an average of 6.18 different facial features (SD = 1.06). Of these mentions, 50% were internal features (eyes, eyebrows, nose, mouth, cheeks), 42% were external features (forehead, chin, jawline, face shape, facial hair, hair, ears), and 8% were features that could not clearly be classified as internal or external (e.g., facial marks such as scars and blemishes). Participants discussed internal features (M = 61.9%, SD = 16.8) on significantly more trials than they did external features (M = 36.7%, SD = 16.5), t(49) = 5.52, p < .001, Cohen's d = 1.51 [95% CI: 18.591, 31.809]. The eyebrows were the most discussed internal feature (M %TrialsMentioned = 82.4, SD = 1.50), and hair was the most discussed external feature (M %TrialsMentioned = 71.6, SD = 2.15).
In addition to discussion of discrete facial features, most participants (N = 36, 72%) used holistic descriptors (e.g., comments about attractiveness, celebrity likeness, personality, vocation). Participants who included holistic descriptors in their discussions mentioned them on 17.36% trials, on average. There was no statistically significant difference in accuracy for the Verbal GFMT overall, for match trials only, or for mismatch trials only, between participants who used holistic descriptors compared to those who did not (Overall Accuracy: t (49)  We also wanted to understand which quantitative or qualitative aspects of the process of communicating about faces were driving the large variation in Verbal GFMT performance (see Figure 3). We examined factors which might be associated with face communication effectiveness as measured by the Verbal GFMT score, including time spent on the Verbal GFMT, number of features discussed, use of holistic descriptors, and number of trials where individual facial features were discussed. Although the number of trials involving descriptions of facial marks was moderately inversely related to overall accuracy (r s [48] = −.367, p = .009) and the number of trials involving descriptions of ears was moderately inversely correlated to match trial accuracy (r s [48] = −.308, p = .030), these correlations were not signif-

Discussion
Results of Study 1 suggest that people can communicate about the appearance of a face to another person reasonably effectively, enabling 72% accuracy on our newly developed Verbal GFMT. This represents a 10% reduction in accuracy when compared to individual participants viewing both images on a computer screen simultaneously. Interestingly, this accuracy reduction was driven by poorer Verbal GFMT performance in non-match pairs only, pointing to a strong bias to make 'match' responses when people describe faces to one another. The causes of this are unclear, but may reflect confirmation bias (Nickerson, 1998) -that is, when one participant described the appearance of a feature on their image, their partner was more likely to agree that their image also contained a feature of similar appearance. We additionally found evidence that verbally communicating about faces led to a modest, yet statistically significant, reduction in subsequent perceptual face-matching performance. Although it is possible that this is simply task fatigue, the reduction may suggest that verbalization of face information leads to an impairment in subsequent simultaneous discrimination of faces. Of note, some studies of eyewitness memory have found that describing faces from memory leads to subsequent poorer recognition of those same faces, and this effect is larger for individuals who have higher perceptual expertise for faces ('verbal overshadowing effect'; Ryan & Schooler, 1998). The question of whether the task of describing faces causes an impairment in subsequent simultaneous discrimination of different faces, and whether this is related to an individual's perceptual expertise for faces, requires further investigation.
We also observed large variation in face communication performance, which was not explained by perceptual face identification ability, the discrete facial features discussed by participants, or overall comments about facial appearance. However, it was not clear whether the variation in Verbal GFMT performance was due to individual differences per se, or test unreliability (White & Burton, 2022). We conducted post-hoc analyses to examine the internal reliability of measures used to assess perceptual face identification ability and verbal face skill and found both tests were below accepted psychometric thresholds (Perceptual GFMT: α = .621, Verbal GFMT: α = .316). Consequently, based on the data from Study 1 alone, we cannot rule out an association between verbal and perceptual face identification accuracy, and so we follow up this question in Studies 2 and 3.
While Study 1 examined a relatively naturalistic and interactive style of communication about faces, this design did not allow us to examine the independent contributions of the people generating descriptions (henceforth 'describers') and the people that interpreted descriptions (henceforth 'identifiers'). Therefore, in Studies 2 and 3 we isolated these contributions. Additionally, given the reliability concerns identified in Study 1, in Studies 2 and 3 we also make use of an extreme-groups design, comparing super-recognizers -individuals with exceptional face identification skill -to 'control' individuals with normative levels of face identification skill. An extreme groups design provided more statistical power to detect an effect of face identification skill on face communication ability, if one exists (Feldt, 1961). This is consistent with the approach used in many other studies of individual differences in face perception (see White & Burton, 2022 for a review) and with evidence that super-recognizers superior ability exists on a continuum of ability with typical viewers (Dunn et al., 2022;Noyes et al., 2017).

STUDY 2
In Study 2, we examined whether the functional quality of the face descriptions generated by 'describers' was predicted by their face identification ability. Although we did not find a relationship between face communication accuracy and face identification ability in Study 1, this may have been due to the relatively poor psychometric properties of measures of verbal and perceptual ability, and by the fact that contributions of the individuals sending and receiving verbal information were too heavily intertwined.
To provide a more powerful test of whether verbal and perceptual face abilities are associated, we recruited a group of 'super-recognizers' that had been verified as having extremely high levels of perceptual face identification ability based on rigorous prior testing. In Phase 1, we asked super-recognizers and control participants to describe faces so that people would later be able to recognize them. In Phase 2, we then provided a new set of participants with facial descriptions from Phase 1 and compared their ability to identify faces from descriptions generated by super-recognizers and control participants. The study was pre-registered (https://aspredicted.org/VRZ_END).

Participants
In Phase 1, 'describer' participants were either 'super-recognizers' (N = 16; M age = 39.8 years, 75% female, 75% Caucasian) or 'controls' (N = 20; M age = 54.4 years, 60% female, 100% Caucasian). Super-recognizers were individuals who in prior testing performed greater than 1.7 standard deviations above the mean on each of three standardized face identification tasks (Cambridge Face Memory Test+ [Russell et al., 2009], Glasgow Face-Matching Test [Burton et al., 2010], UNSW Face Test [Dunn et al., 2020]). Control participants were individuals who in prior testing performed within one standard deviation of the mean on each of the same standardized face identification tasks (see Table 1). To incentivize participants to generate high-quality descriptions, we awarded $50, $30, and $20 Amazon vouchers to the participants whose descriptions led to the first, second and third highest identification accuracy in Phase 2 (open to both super-recognizers and controls). A power analysis indicated our sample size of 36 participants was sufficient to give 80% power (see Supporting Information for details).
In Phase 2, 'identifier' participants were 298 MTurkers from the US and 148 first-year UNSW Psychology students. MTurkers completed the study in exchange for monetary compensation (US$1.50) while UNSW students completed the study in exchange for course credit. After exclusions for suspected bot performance, study ineligibility, or unusually quick study completion (see Supporting Information for details of exclusions; N MTurkers = 24, N UNSW Students = 12), the final dataset comprised of 410 identifier participants (MTurkers: M age = 35.0 years, 39.9% female, 63% Caucasian; UNSW students: M age = 19.6 years, 73.1% female, 52.2% Asian). All included participants in Phase 1 and Phase 2 were native English speakers who had normal or corrected-to-normal vision.

Face description task (Phase 1)
Images in the face description task were taken from the Glasgow Unfamiliar Face Database (GUFD). After excluding faces included in the 40-item GFMT (Burton et al., 2010), we then selected 10 male, UNSW Face Test, mean (z score) GFMT, mean (z score) CFMT+, mean (z score)  (Burton et al., 2010;Dunn et al., 2020;Russell et al., 2009) Caucasian faces, aged between 18 and 28 years old. These 'target' faces were selected so that the set varied in eye colour, hair colour, and build. During completion of the task, describer participants were shown each of the 10 target faces and asked to 'describe the face in sufficient detail such that someone else could identify this person solely based on your description' (see Figure 5a). Our primary dependent variable was description accuracy, indexed by the number of identifiers in Phase 2 who correctly identified the target face with the description over the total number of identifiers in Phase 2 who were given the description.

Identification-from-description task (Phase 2)
This task was created using target images from Phase 1, and additional images selected from the GUFD to be used as line-up distractors. The experimenter selected three new faces from the GUFD, who were not included in the 40-item GFMT (Burton et al., 2010), to serve as distractors for each 'target' identity by matching basic demographics (e.g., gender, age, build) and who appeared similar to the target for at least two facial features (e.g., hair colour, eye colour, build, presence of facial hair). The three selected 'distractor' images were then placed in an array along with a new image of the target face that had not been used to generate the Phase 1 descriptions, to create target present line-ups for each target face. The fairness of these line-ups was pilot tested by showing each to a small group of volunteers who were asked to pick the most distinctive face from the line-up. Line-ups were considered fair if, on average, the target identity was not rated as the most distinctive face in the line-up. During completion of the task, identifier participants were given 10 descriptions of faces provided by Phase 1 participants. The allocation of descriptions to participants was randomized, however, and was designed such that each description from Phase 1 was allocated to at least 10 different identifiers (post-data exclusions). For each of the 10 descriptions, identifiers were presented with an array of four faces. On each trial they were instructed to read each description and identify the 'target' individual (see Figure 5b). They were additionally prompted with the instruction 'Note that the descriptions are of each person's face, not necessarily exactly how they appear in these photos'. Although our primary dependent variable was description accuracy, we were additionally interested in the overall accuracy of F I G U R E 5 (a) Sample trial from Face Description Task. On each trial, 'describer' participants were presented with a face image and asked to 'Please describe this face. You should describe the face in sufficient detail such that someone else could identify this person solely based on your description'. (b) Sample trial from Identification-From-Description Task. On each trial, 'identifier' participants were presented with one face description written by a describer in Phase 1 and the line-up which included the target photo in a random position. Participants were asked 'Who does the description belong to?'. An example description for this trial is 'Male, young, pale skin, dyed/bleached dark/light shaggy hair styled to stick up. Blue eyes, medium brown eyebrows. Fairly wide gap between straight eyebrows. Oval face, rounded chin. Plump bottom lip, mouth wide. Tuft of hair under lip, embryonic dark moustache, beard. High forehead'. The individual pictured in the top left is the correct answer identifiers, indexed by the percentage of correct identifications in the Identification-from-Description Task.

Glasgow face-matching task (GFMT; Phase 2)
This experiment was primarily designed to examine the influence of perceptual face identification ability on verbal description accuracy (i.e., describer ability). However, there was also an opportunity to examine individual differences in identifier ability and so we included the standard 40-item GFMT (Burton et al., 2010) as a measure of face identification ability in the Australian subset of participants.

Procedure
Both Phase 1 and Phase 2 were completed online. In Phase 2, UNSW student identifier participants also completed the GFMT after the Identification-from-Description Task. All participants provided information on their highest level of educational attainment and current job occupation as an index for their verbal literacy abilities (Gustafsson, 2016). The average time taken to complete the study was 84.5 min for describers and 15 min for identifiers.
In post-hoc analysis, we found that super-recognizers used approximately 200 more total words on average when describing the 10 target identities (M = 658.8, SD = 404.7) than controls (M = 441.0, SD = 194.6), t(20.51) = 1.98, p = .062, Glass' delta = .54 [95% CI: −11.574, 447.174] (NB: we have provided effect size in Glass' delta here given the non-equal variance between groups, but for comparison Cohen's d = .69). Given the discrepancy in average total words used per group, we compared super-recognizer and control descriptions separately for short (<50 words) and long (>50 words) descriptions, based on a median split of description length. For short descriptions, identifiers were on average 7.1% more accurate when given a super-recognizer's description (M = 65.0, SD = 21.0) compared to when they were given a control participant's description (M = 58.2, SD = 22.4), and this difference was statistically significant, t(199) = 2.11, p = .036, Cohen's d = .31 [95% CI: 2.542, 11.058]. In contrast, when considering only longer descriptions (>50 words), there was no difference in identifier accuracy when they were given super-recognizer descriptions (M = 65.0, SD = 16.2) and control descriptions (M = 62.2, SD = 18.9), t(155) = 0.989, p = .324, Cohen's d = .16 [95% CI: −1.109, 6.709]. We also examined whether there were qualitative differences between super-recognizer and control descriptions in terms of facial features discussed; however, no meaningful differences were observed (see Supporting Information).
As evident from visual inspection of Figure 6, there was large variation in the accuracy with which individual participants identified faces from descriptions. While some of this is likely to stem from the unique set of 10 descriptions provided to each participant, we also tested whether performance was associated with perceptual face identification ability of the identifier participants. There was a weak positive correlation between identifier performance on the Identification-From-Description Task and their GFMT scores, r s (132) = .233, p = .007.

Discussion
Study 2 is consistent with Study 1 in showing that people can identify individuals from facial descriptions with reasonable accuracy. Identifiers achieved 62% accuracy on average despite having to use their allocated descriptions for selecting targets from an array of four similar looking faces, which included a different image of the target form that used to generate the description. This was a more challenging task than the pairwise matching in Study 1.
We additionally found that participants could more accurately recognize faces from descriptions written by super-recognizers than those written by control participants, on average. In contrast to Study 1, this suggests that the ability to describe faces is at least partially related to an individual's perceptual face identification ability. This super-recognizer description advantage was most evident for concise descriptions, showing that it was driven by differences in the quality of descriptions, rather than length. One possibility is that super-recognizers are better attuned to what perceptual information is most likely to support identification of an individual and can tailor their facial descriptions accordingly, even when giving concise descriptions. Indeed, previous work has shown that super-recognizers demonstrate enhanced ability to extract facial information relative to normative controls, such as making more accurate identifications with short image exposures (White et al., 2015) or when faces are disguised (Davis & Tamonytė, 2017). Our work provides preliminary evidence that super-recognizers can not only extract facial information more effectively but are also more effective at verbally transmitting the most relevant identifying information to someone else. 1 We also observed large variation in our identifier participants' ability to use facial descriptions for identification, and these were associated with participants' face identity processing ability as measured by the GFMT. However, this analysis was not the main motivation of the study and so we designed a third F I G U R E 6 Identifier accuracy is shown on the Y-axis as a function of description type, that is, whether descriptions are generated by super-recognizer (SR) or control participants. The three different panels show identifier accuracy for all descriptions, short descriptions (<50 words), and long descriptions (>50 words). The solid line on each violin plot represents the median accuracy. The area between the dotted lines on each violent plot represents the interquartile range study to specifically examine the relationship between perceptual face identification ability and people's ability to identify faces from written descriptions.

STUDY 3
In Study 2, we found preliminary evidence for an association between face identification ability and the ability of 'identifiers' to pick faces from a line-up based on a description. In Study 3, we sought to further investigate this relationship and also examine the relationship between description quality and identifier ability. To test these questions, we recruited a new cohort of super-recognizers and normative controls who were tasked with identifying faces from facial descriptions, where such descriptions were either 'good' or 'bad' (as determined by how often the descriptions led to correct identification of the target in Study 2).

Participants
Thirty-six controls and 36 super-recognizers completed the study. None had participated in the previous studies. Four control participants and 6 super-recognizer participants were excluded due to not being Native English speakers. Thus, the final dataset comprised of 32 controls (M age = 53.3 years, 84.4% female, 90.6% Caucasian) and 30 super-recognizers (M age = 39.3 years, 70% female, 73.3% Caucasian).

Design & procedure
The study was a 2 × 2 between-subjects design, with factors of Group (super-recognizer, control) and Description Quality (good, bad). Description quality was manipulated by selecting the three best and three worst descriptions for each target face from Study 2 (i.e., the descriptions which gave rise to the highest or lowest identification rates respectively; see Supporting Information for descriptions). Participants were randomly allocated to a description quality condition (Good: N SR = 13, N Control = 17; Bad: N SR = 17, N Control = 15).
Participants completed the Identification-From-Description Task (see the Materials section of Study 2 for detailed information about this task). On each trial, participants were randomly allocated one face description that was congruent with their description quality condition (i.e., good or bad). They were additionally prompted with the instruction 'Note that the descriptions are of each person's face, not necessarily exactly how they appear in these photos'. The order of trials was randomized for each participant.
The results show that the ability to use facial descriptions for identification is driven in large part by the quality of the description provided. However, they also indicate that super-recognizers are better able to make identifications based on facial descriptions than individuals with normative levels of face identification ability. Further to Study 2, this suggests super-recognizers may be better able to extract diagnostic perceptual information from facial descriptions to make identifications, in addition to being better at selecting and describing the identifying features of a face. 2

GENERAL DISCUSSION
We conducted three experiments to better understand how people communicate about faces and the utility of facial descriptions for identification purposes. Across these studies, we consistently showed that individuals can communicate about faces with reasonable accuracy, regardless of whether such communication occurred bidirectionally in a live matching context (Study 1) or unidirectionally when isolating the abilities of making facial descriptions from using facial descriptions for identification (Study 2 and Study 3). We also consistently showed evidence of large variation in face communication efficacy, which were associated with the face identification abilities of the 'describer' (Study 2) and the 'identifier' (Study 3). This association was subtle, likely reflecting that face communication relies on other abilities, for example verbal skill.
Prior work in this area shows above-chance identification performance based on verbal descriptions (Fallshore & Schooler, 1995;Kramer & Gous, 2020). Study 1 improved understanding of face communication by showing that difficulties in describing faces for identification purposes are not ameliorated by allowing naturalistic communication. This study also showed that the decrease in performance in 2 Similar to Study 1 and Study 2, we completed post-hoc internal reliability analyses. Internal reliability for identifiers was α = .815. This appears to be driven by a marginally higher internal reliability statistic for controls (α = .832) than for super-recognisers (α = .794). We return to this finding in the General Discussion.

F I G U R E 7
Mean identification accuracy (% correct) on the identification-from-description task as a function of identifier group (super-recognizer [SR] or control). The two different panels show identification accuracy for bad descriptions and good descriptions. The solid line on each violin plot represents the median accuracy. The area between the dotted lines on each violent plot represents the interquartile range face-matching when using verbal descriptions, compared to perceptual matching, was entirely driven by an increase in false positive errors. Although the reason for this bias is not clear, it is nonetheless practically important, suggesting for example that when searching for persons of interest in criminal investigation, descriptions could elicit many spurious leads.
To our knowledge, this is the first set of studies to explore how individual differences in face perception relate to face communication. Prior work has focused on the accuracy of face communication (Fallshore & Schooler, 1995;Kramer & Gous, 2020). Here, we found that face identification skill contributed to variance in face communication performance. When examining the face processing abilities of super-recognizers, we found a sizeable performance advantage in both describing faces and using face descriptions for the purpose of identification, as compared to individuals with normative levels of face identification skill. This result therefore broadens the scope of super-recognizers' abilities. While some studies have found evidence of supra-modal person identification abilities that extend to voice recognition (Jenkins et al., 2021), general object processing (Bobak et al., 2016) and fingerprint matching (Towler et al., 2021), our result is the first to show that super-recognizers demonstrate enhanced ability beyond purely perceptual tasks. Future research may like to explore how the full spectrum of individual differences in face perception relate to face communication.
We also aimed to better understand what aspects of facial descriptions were most effective for identification. In Study 1, we found no association between the facial features discussed by participants, nor overall comments about facial appearance, and face communication accuracy. Similarly, in Study 2, we found no obvious differences in the content of super-recognizer and control descriptions despite differences in the mean accuracy of their descriptions. This extends work by Ritchie et al. (2022) who found no relationship between facial features discussed and the accuracy of dyads on a face-matching task when both images were in view simultaneously. In the context of face communication specifically, Kramer and Gous (2020) found that discussion of the eyes, nose, mouth, hair, ears and face shape were not associated with identification accuracy. That accuracy was similarly not associated with the discussion of other facial features in our study (e.g., facial hair, jawline, cheeks, facial marks) shows that the inclusion/exclusion of particular facial features in face descriptions is not what makes them effective for identification purposes. This raises the question of what makes for an effective face description.
One possible explanation is that the content of facial descriptions is less important than the communication context they are exchanged in, including the interpersonal dynamics unique to each dyad in Study 1 (e.g., rapport, familiarity, conversational approach, propensity for information seeking vs. information giving). Indeed, in many other areas of communication, interpersonal factors are predictive of performance, including for negotiation outcomes, goal achievement, and social outcomes including likeability and attractiveness (Curhan & Pentland, 2007;Leung & Bond, 2001;Martin & Dowson, 2009). In the context of face communication, one potential interpersonal contributor to accuracy is the degree of familiarity between describers and identifiers. In our experiments, participants were unfamiliar (i.e., had no personal relationship) with their study partner (Study 1) or with the person they were generating face descriptions for or receiving descriptions from (Study 2 and Study 3). However, familiarity with the other person involved in face communication may confer benefits to task performance, including a nuanced understanding of each other's vocabulary, particularly for use of ambiguous terms. In support of this idea, evidence suggests that linguistic similarities predict friendship forming, and that friends, 'new' couples, and married couples exhibit linguistic convergence over time (Anolli & Balconi, 2005;Brinberg & Ram, 2021;Kovacs & Kleinbaum, 2020). Consequently, future research could examine whether the nature of the relationship between interlocuters in face communication affects the usefulness of descriptions for identification.
Another possibility is that qualitative aspects of face descriptions are more important to face communication accuracy than quantitative properties (e.g., description length, number of features discussed). For example, in creative writing, vivid descriptions tend to evoke richer mental imagery of described stimuli than list-based descriptors or those that are more precise (Jajdelska et al., 2010). It has also been argued that fictional character descriptions which facilitate the reader's connection with the sensory experience of the protagonist (e.g., based on metaphor) are more likely to elicit an emotional reaction and compel an individual to continue reading, as compared with narrative summary (Ingermanson & Economy, 2009). However, whether vivid or emotionally salient descriptions are more accurate for identification purposes is unknown. Consequently, future research could further explore the qualitative aspects distinguishing good and poor face descriptions, which in turn may inform avenues for training in face communication.
The goal of the present work was to explore the accuracy of face communication and examine its relationship to individual differences in face identification. Given the exploratory nature of our work, we did not seek to develop a psychometric test optimized for the measurement of individual differences in face communication. Nonetheless, the reliability of measures is an important consideration in the interpretation of results. In post-hoc analysis, we found relatively good internal reliability of identifiers (α = .621 and α = .815 for identifiers in Study 2 and Study 3 respectively) while for real-time face communicators and describers internal reliability was below accepted psychometric thresholds (Study 1: α = .316 for the Verbal GFMT; Study 2: α = .134 for describers). Consequently, it is unclear if variation in performance on face description measures reflects individual differences or test unreliability.
The fact we see higher internal reliability for identifiers than describers suggests that the way people use descriptions for identification may be more consistent than the way people generate descriptions. It might also suggest that researchers' ability to measure an individual's face description ability, using the novel approach we describe here, is confounded by idiosyncratic properties of the person using that description for identification -and/or the faces being described -rather than intrinsic properties of the description or describer. Future research aiming to develop psychometric tests for face communication could therefore aim to redress this difficulty by using more diverse stimuli, and also perhaps by constraining the descriptions more than the free description method used here (e.g., with a rating scale procedure).
Another important consideration in the interpretation of our results is that in the present set of studies all images used were highly controlled in that they were studio quality, front-facing photos taken minutes apart. Although investigating performance under controlled image conditions is a useful first step in understanding face communication, such conditions may not reflect the real-world nature of face communication. For example, when identifying someone from a description in the real world, the described individual may differ substantially from the image which the description is based on due to factors such as ageing, disguise, or routine appearance modification (e.g., getting a haircut, applying make-up). In purely perceptual tasks, identification accuracy falls when task difficulty is increased by increasing image variation (Noyes & Jenkins, 2019;White et al., 2015). Consequently, future studies could examine the decrement in face communication caused by more naturalistic variation in images.
In sum, our findings show that individuals can communicate about faces and there is large variation in this ability. Additionally, perceptual face identification skill partly explains why some facial descriptions are better than others as well as why certain individuals are more accurate at using facial descriptions for identification. While many prior studies in face identification have investigated the effect of perceptual task demands on performance (Hancock et al., 2000), we have shown here that verbal processes are another important contributor to accuracy, which has implications for a number of applied tasks. We recommend that legal and forensic stakeholders temper decision-making based on evidence that requires verbal communication of perceptual face information as deriving face identity from face descriptions generates an especially high rate of false positives relative to standard identity processing. Moving forward, to obtain a full understanding of the complexities in face identification, it will be important for research to consider not only the impact of perceptual demands but also verbal processes on performance.