Approaches to the study of human mate preferences commonly involve judgements of facial photographs and assume that these judgements provide a reasonable reflection of how individuals would be perceived in real encounters. However, three recent studies have each reported non-significant correlations between judgements using photos (static images) and those using videos (dynamic images). These results have led previous authors to conclude that static and dynamic faces are judged according to different evaluative standards and that this may call into question the validity of findings from experiments using static images. However, the extent of the discrepancy in judgements between image formats remains unknown, and may be influenced by different experimental designs. Here, we tested the effects of several experimental design factors on the strength of correlations between image presentation formats. Using both male and female targets, we compared observed static–dynamic judgement correlations when (1) judgements were made by the same or different raters, or (2) by raters of the same- or opposite-sex to the targets, and (3) when dynamic stimuli were collected under different contextual scenarios. For (1) and (2), we also measured correlations when order of presentation of static and dynamic stimuli was alternated. Our results suggest that each design feature has independent effects on the strength of static–dynamic correlations. Correlations were stronger when static and dynamic stimuli were rated by the same raters. They were weakest for judgements of males by females, when based on seeing photos before videos. This interaction with sex is consistent with previous studies, indicating that females are especially responsive to male dynamic cues. However, in contrast to previous findings and in all cases, static–dynamic correlations were strongly and significantly positive, indicating that judgments based on static images provide an accurate representation of someone’s attractiveness during prolonged encounters.
More recently, there has been growing recognition that neutral photographs (static images) portray only part of the information that is used when forming mate preference judgements. While structural aspects of faces are likely to explain a large proportion of the variance in real-life judgements, a significant proportion will also be accounted for by variation in expressiveness, gaze, perceived personality and other cues which are available during actual encounters and in dynamic images such as videos (Grammer 1990; Grammer et al. 2000; Gangestad et al. 2004; Morrison et al. 2007; Conway et al. 2008; Penton-Voak & Chang 2008). Indeed, the use of videos as a more ethological, and ecologically valid, approach has been recognised as an important and emerging dimension to the study of human mate preferences (Gangestad & Scheyd 2005; Penton-Voak & Chang 2008; Roberts 2008).
Against this background, interpreting the relative importance of structural and movement cues is a critical issue. In other words, it is important to estimate the extent to which judgements made under static and dynamic presentations either do or do not reflect each other. In particular, if judgements of static images do not predict judgements using dynamic images, this would call into question the validity of a significant amount of existing research using photographs alone and radically challenge the foundations on which the research field is based. However, there have been surprisingly few direct tests of the relationship between judgements based on static and dynamic images (Penton-Voak & Chang 2008) and the results of the studies conducted so far are mixed (see Table 1). Rubenstein (2005) found that judgements of female images under the two conditions were not strongly, and not significantly, correlated. He concluded that static and dynamic faces are judged according to different evaluative standards, in which the latter are particularly influenced by emotional expression. In contrast, Lander (2008) found significant static–dynamic correlations for judgements of female images, but not for images of males. Extending Rubenstein’s work, Penton-Voak & Chang (2008) incorporated two emotional expression conditions (positive and negative) into their design. They reported significant correlations between static and dynamic presentations of female images, similar to Lander (2008), and these occurred irrespective of emotional expression. However, they found positive but non-significant correlations between presentations of male images, in either emotional condition. Finally, Roberts et al. (2009) found a significant and positive correlation using female judgements of male images.
Table 1. Summary of the methodology and results of studies investigating correlations between attractiveness ratings when images are presented in static and dynamic formats
Stimulus sex: M, male; F, female. Rater sex: MF, all raters; the term M,F,MF indicates separate analyses using either M raters, F raters or MF raters. Context denotes the task given to stimuli when recording dynamic images (Cue, cue card; Holiday, description of holiday plans; Introd., introductory video to member of opposite-sex). Design denotes whether the same or different raters judged static and dynamic images (BS, between subjects; WS, within subjects). Correlation denotes the study findings (ns, no correlation; +, positive correlation).
In view of these mixed results, we aimed to provide a further test of this interesting and important issue. We reasoned that the extent to which perceptions either do or do not differ under static/dynamic presentation may depend on methodological issues that require further exploration (summarised in Table 1). For example, both Rubenstein and Lander used a between-subjects design: rather than comparing static/dynamic judgements of female faces by the same raters, different sets of raters judged either condition (to some extent the same was true in Penton-Voak & Chang’s study, where raters saw all faces but only one quarter of them in each of the four conditions). While using different raters is a reasonable step to ensure independence of ratings under two different conditions, well-documented individual differences in preferences (e.g. Little & Perrett 2002) could have, at least partially, obscured the relationship between the two measures. Furthermore, in Rubenstein’s study, ratings were averaged across raters of both sexes; hence, ratings could have been confounded by the same-sex judgements made by females compared with the opposite-sex judgements made by males. Third, the context under which dynamic stimuli are captured might be influential for rater perceptions. Both Rubenstein and Lander video-recorded targets while reading from a cue card, while those in Penton-Voak & Chang’s study were filmed both under a neutral context similar to using a cue card (reciting a series of numbers or pictures) and also while describing plans for a holiday where the stimuli were more naturalistic. In this vein, Roberts et al. (2009) asked men to film themselves in a scenario in which they were were asked to introduce themselves as although they were meeting an attractive woman. Finally, Penton-Voak & Chang (2008) and Roberts et al. (2009) presented videos without sound, controlling for variability in semantic content but also withholding from raters any vocal cues of mate quality which are known to influence attractiveness judgements (Feinberg et al. 2005a,b; Saxton et al. 2006). The same was true of Lander’s (2008) study (pers.comm. to TKS) but it is unclear from Rubenstein (2005) whether or not dynamic stimuli were played with sound; if they were, vocal cues would have been available and might have influenced the judgements of dynamic cues in ways not possible under static presentation.
Here, we examined these issues by collecting image sets and ratings to compare static–dynamic correlations when judgements are made by the same or different sets of raters, by raters of the same- or opposite-sex to the targets, and when (silent) dynamic stimuli are collected under different scenarios.
The study was approved by the University of Liverpool’s Committee on Research Ethics. Twenty male (mean age = 26.1) and 20 female (mean age = 22.2) undergraduates were photographed (Canon Powershot) standing, with a neutral expression, looking straight at the camera, in front of a plain background and in a windowless room lit with standard fluorescent lighting. Photographs were cropped just above the top of the head and to just below the waist, and normalised for horizontal height (Psychomorph; Tiddeman et al. 2001) to standardise presented image size. Images were resampled to 400 × 480 pixels (resolution 72 dpi). Seated participants were then video-recorded while they introduced themselves as they might to someone in a bar (‘mate choice context’). Video clips were subsequently processed (Adobe After Effects 7.0) and edited to a duration of 20 s, cropped to dimensions of 400 × 480 square pixels and encoded as 25fps QuickTime movies using the MPEG-4 codec. For comparison of the context under which videos were recorded, an independent set of 20 males also participated; these were treated in exactly the same way as above, but during filming they described their most recent holiday (non-mate choice context). In both video contexts, videos were presented without sound to withhold semantic content and vocal cues from the raters, thus ensuring that any potential differences in judgements were solely because of visual cues.
For the main study, in which videos were recorded for the mate choice context, photographs and videos were rated by 96 raters in total. These comprised two independent sets of 24 males and 24 females; one set saw photos first (mean age: males = 24.4, females = 22.0), the other saw videos first (mean age: males = 22.5, females = 22.1). Ratings were carried out using a 7-point scale anchored by the descriptors ‘very unattractive’ and ‘very attractive’. In the non-mate choice context, another independent set of 10 females provided ratings.
Mean ratings per target were analysed using Pearson’s correlations. First, between-subjects correlations of static and dynamic image ratings were calculated for male and female stimuli separately, using either all raters (male and female combined) or only male and female raters. To obtain these between-subjects ratings in a way that controlled for exposure to the images, we used static image ratings from those raters who saw static images first and dynamic image ratings from raters who saw dynamic images first. Corresponding estimates of variance (r2) in dynamic ratings that were explained by static ratings were computed through bivariate linear regression.
Second, we compared several experimental design effects on the strength of static–dynamic correlations using a simple resampling procedure to generate sets of correlation coefficients based on equally sized subsets of raters. We were able to compare the effect of within-subjects vs. between-subjects ratings because each rater saw images in each of the two presentation formats. Between-subjects ratings were obtained by comparing the ratings of static images by half the raters with the ratings of dynamic images by the other half. However, any possible subdivision of raters will likely generate a slightly different correlation coefficient, and, if it was done only once, it is theoretically possible that a strongly positive correlation might have arisen by chance. The resampling procedure avoided this problem. We randomly subdivided raters so that half were used to calculate mean ratings for static images and the other half were used to calculate mean ratings for dynamic images, then calculated the resulting static–dynamic correlation coefficient, and repeated this procedure to a total of 40 iterations. We also used the same approach for within-subjects ratings, so that the power of the correlations was equivalent across all relevant comparisons. We calculated 40 iterations for each of eight comparisons (image: male/female; rating design: within/between subjects; rater sex: opposite/same). The resulting 320 correlation coefficients were then analysed by factorial anova (data fulfilled assumptions of this test). This analysis was performed separately for raters who saw photos first and those who saw videos first.
Finally, we compared the effect of context (mate choice vs. non-mate choice task during collection of dynamic images) by comparing two corresponding sets of 40 iterated correlation coefficients for male static and dynamic images, based on independent sets of raters and stimuli (i.e. the within-subjects design with equivalent numbers of raters), using an independent-sample t-test.
Correlations of mean ratings of static and dynamic images, for 20 male and 20 female targets, show that attractiveness judgements are strongly and positively correlated (Fig. 1, Table 2), regardless of whether they were based on judgements by raters of the same- or opposite-sex, or by all raters. In each case, correlations were significant with coefficients at least over 0.728 and with r2 values of at least 0.531.
Table 2. Relationships between male and female static and dynamic stimuli when judged by same-sex or opposite-sex raters, or both male and female raters together
Values are given as Pearson’s r (r²) and p.
0.838 (0.702), <0.001
0.785 (0.617), <0.001
0.834 (0.696), <0.001
0.728 (0.531), <0.001
0.866 (0.750), <0.001
0.809 (0.654), <0.001
Using anova with static/dynamic correlation coefficients as the dependent variable and stimulus sex, rater sex and rating design (within- or between-subjects) as factors, we found significant main effects of rating design, rater sex and stimulus sex, and some interactions (Table 3). Regardless of whether static or dynamic images were seen first, higher correlations were found using a within-subject rating design. However, other effects indicated that the strength and direction of differences in correlations varied according to the order of image presentation. When static images were seen first, correlations between ratings of static and dynamic images were higher for judgements of male than female images, and for same-sex than opposite-sex ratings (see Fig. 2). The opposite was true when videos were seen first. Additionally, there were significant interactions between rating design and rater sex (correlations were lowest in opposite-sex ratings in the between-subjects design, indicating this was where lies the lowest degree of concordance in ratings), and between stimulus sex and rater sex: correlations between male static and dynamic images were lower when judged by females than males (at least in the photo-first order), and correlations between female static and dynamic images were higher when judged by males than females (at least in the video-first order).
Table 3. Results of anova on Pearson’s correlation coefficients
Rating design denotes whether static and dynamic images were judged by same (WS, within-subjects) or different (BS, between-subjects) rater sets. Other interaction effects were not statistically significant.
Rating design (WS vs. BS)
Stimulus sex × rater sex
Rating design × rater sex
Rating design (WS vs. BS)
Stimulus sex × rater sex
Rating design × rater sex
Finally, we investigated the effects of context, using an independent set of raters and stimuli. Within-subject ratings for static and dynamic images in the non-mate choice context (description of recent holiday) were again strongly correlated (Pearson’s r = 0.836, p < 0.001). However, comparison of female ratings for males video-recorded in the mate choice and non-mate choice contexts revealed that the distribution of iterated correlation coefficients between contexts were significantly different, being higher in the mate choice context ( ± SE; non-mate choice = 0.704 ± 0.011, mate choice = 0.741 ± 0.013; t78 = 2.13, p = 0.036).
Our results indicate that ratings of static and dynamic images can be strongly positively correlated. Importantly, the strength of this correlation is dependent on (1) whether the two sets of ratings are produced by the same or different raters, (2) whether raters are of the same- or opposite-sex to targets and (3) the context in which dynamic images are collected.
Previous studies have found mixed evidence for the strength of this relationship, although in all cases the relevant correlations are positive, even if only weakly so (Rubenstein 2005; Lander 2008; Penton-Voak & Chang 2008). To summarise the specific findings from these studies (see also Table 1), significant static–dynamic correlations have been found for female faces by Lander and Penton-Voak & Chang but not by Rubenstein. For male images, no significant correlations were found by either Lander or Penton-Voak & Chang (Rubenstein did not use male images), although Roberts et al. (2009) did report a significant positive relationship. Our subsequent analyses revealed that the lowest correlations were found in female ratings of males (although only when photos were presented first, Fig. 2 and Table 3), which is somewhat consistent with both Lander (2008) and Penton-Voak & Chang (2008). These results indicate that there may be sex-specific differences in the extent to which static images reflect judgements of mate value in more ecologically valid contexts (see also Penton-Voak & Chang 2008). This may be because attributions of personality appear more salient in opposite-sex judgements of males than of females (Berry & Miller 2001), a possibility supported by our finding that the correlation was least strong amongst female raters, and weaker than opposite-sex judgements made by male raters. However, this same effect was not evident when videos were rated before photos, suggesting that, having seen videos, women were influenced by remembered impressions based on dynamic cues when rating photos (see also Lander & Bruce 2000).
Each of the methodological refinements we included in this study might at least partly account for the discrepancies in results between studies and the more positive relationships that we report here. Adoption of a within-subjects rating design is one important aspect that should be easy to incorporate into future studies and it is a logical one if the researcher’s interest is to understand how individual ratings differ across two conditions. However, we note that one weakness of this is that there may be carry-over effects from judgements made under one condition and those under the other. Indeed, although we found robust positive correlations in all comparisons, we did find slightly different correlation strengths depending on order of image presentation (notably with respect to female ratings of male images). If the same raters were to be used, then presenting static images before dynamic ones is arguably more consistent with the natural progression that occurs during a first-time encounter with an unfamiliar individual, where an initial impression is made within as little as 100 ms (Willis & Todorov 2006) and this is presumably more weighted by structural facial features contained within static images; behavioural cues and mannerisms such as those evident in a 20 s video are likely to manifest after this first impression.
Furthermore, our results showing rater sex and stimulus sex main effects and interactions suggest that studies examining rated attractiveness should, wherever possible, choose either male or female raters, as appropriate to the needs of the study, rather than conflate male and female raters in one assessment set. Finally, the context under which the dynamic images are recorded also affected the strength of the correlations reported and so should be carefully chosen when planning experiments. Here, we found higher static/dynamic correlations when participants were asked to introduce themselves than we did when they were asked to speak about a recent holiday. This may be because these different contexts elicit different or more variable sets of expressions and movement cues. It is possible that self-introduction is perceived as a task that is more difficult than recounting a series of previous events, and that this difficulty elicits behavioural cues which more reliably reflect mate quality than those evinced by other contexts. This interpretation is also consistent with the low correlations recorded during an emotionally neutral and relatively easier task such as reading from a cue card (Rubenstein 2005; Lander 2008). We conclude that context is likely to be an important aspect of design, but one which requires more investigation.
We certainly do not dispute that dynamic images contain additional information for facial perception to that contained in static images, including salient emotional cues. We also agree that dynamic cues may be particularly informative for face recognition (e.g. Lander & Chuang 2005) or perception of emotion, personality and attractiveness (e.g. Rubenstein 2005; Morrison et al. 2007; Roberts 2008), notwithstanding the contributions also made by features contained within neutral-posed, static images (e.g. Penton-Voak et al. 2006; Little & Perrett 2007; Roberts & Little 2008). However, in this study, most (53–75%) of the variance in dynamic ratings is explained by static ratings, even when using a between-subjects design. An outstanding problem for future research using dynamic images, therefore, is how to investigate the contribution of dynamic cues independently of information contained in static appearance. We believe that solutions would include using residuals from regression with dynamic ratings as the dependent variable (such residuals would represent the unique contribution of movement to face perception: see for example, Roberts et al. 2009) or using static image ratings as a covariate to control for their effect. We suggest that these approaches could prove useful in attempts to disentangle and evaluate the role of dynamic information in face perception.
In conclusion, although a fuller explanation of the role of dynamic information in human attractiveness judgements remains an intriguing area of enquiry, our data suggest that the disparity between judgements in image presentation type may be far smaller than previously suggested: the attractiveness of a static face is a good predictor of that person’s attractiveness when moving. In this way, static and dynamic stimuli cue similar attractiveness judgements.
We thank Alex Kralevich for assistance with data collection and Janne Kotiaho and two anonymous reviewers for their helpful comments. ACL is supported by the Royal Society, HMR by the NERC, AKM by the BBSRC and Unilever, and TKS by the University of Liverpool, the Owen Aldis Scholarship Fund and an ESRC postdoctoral research fellowship.