Systematic Face Pareidolia Generation Method Using Cycle‐Consistent Adversarial Networks

Pareidolia is a psychological tendency of perceiving a face in non‐face stimulus. As a majority of people globally experience this tendency, it has been extensively studied and measured in terms of tendencies, such as frequencies. However, no study has investigated the systematic manipulation of stimulus owing to the lack of a systematic image‐generation method. Therefore, herein, we generated face pareidolia stimuli using a face data set with annotated data. We employed cycle‐consistent adversarial networks (CycleGAN), an image‐to‐image‐style translation framework, to generate stimuli for translating natural‐image styles from face images. We manipulated the weight of the cycle‐consistency loss in the CycleGAN via an experiment to evaluate the image generated using the CycleGAN. Thus, we found that the weight value of the evaluation experiment correlated with the pareidolia‐inducing power when the preprocessing of the face data set was applied to the blur process. As a result, we achieved to systematically generate pareidolia stimuli. © 2024 The Authors. IEEJ Transactions on Electrical and Electronic Engineering published by Institute of Electrical Engineer of Japan and Wiley Periodicals LLC.


Introduction
We often experience a psychological phenomenon called pareidolia, the psychological tendency to perceive specific patterns in natural scenes, in our daily life.As shown in Fig. 1, pareidolia can be perceived from a specific object, such as an outlet, in front of the car.DeepDream, which can create psychedelic images based on the pareidolia concept using deep learning, has been developed.A virtual reality video based on DeepDream can induce a subjective experience similar to that of a real psychedelic [1].Therefore, pareidolia is associated with human perception and the field of neuropsychology.As humans perceive faces even in static objects, in this study, we propose an image-generation technique to provide a systematic generation framework.
Pareidolia can be associated with certain illnesses, for example, Lewy body dementia [2].Patients with Lewy body dementia experience more pareidolia compared with healthy individuals.Thus, a diagnosis method called the pareidolia test was developed a Correspondence to: Takuya Akashi.E-mail: akashi@iwate-u.ac.jp *Graduate School of Science and Engineering, Department of Design and Media Technology, Iwate University, 4-3-5, Ueda Morioka, Iwate, 020-8551, Japan **Graduate School of Arts and Sciences, Division of Science and Engineering, Iwate University, 4-3-5 Ueda Morioka, Iwate, 020-8551, Japan ***Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, 91125, USA ****Faculty of Science and Engineering, Iwate University, 4-3-5 Ueda Morioka, Iwate, 020-8551, Japan to determine whether a patient is suffering from Lewy body dementia.Similarly, patients with Parkinson's disease also experience more pareidolia compared with healthy individuals [3].Both these illnesses feature symptoms of visual hallucinations, thus suggesting that pareidolia may be closely related to visual hallucinations.By contrast, certain illnesses induce less pareidolia experience compared with that experienced by healthy individuals.One typical example for this is autism spectrum disorder [4,5] whereby patients perceive a face as less developed compared with those without autism.Essentially, they cannot perceive the face from the facial component.
Several types of pareidolia stimuli have been used to investigate face pareidolia perception.One of the stimulus type is an artifact that is artificially made of an image and photograph; 'Alcimbold' and 'Mooney Face Test' are typical examples of artifacts.Based on these artifacts, new stimuli have been generated to investigate face pareidolia perception [6,7].Artificially-generated stimuli are also widely used [8].Another approach involves using image stimuli that can induce pareidolia [9].
The pareidolia-inducing power of these stimuli is uncertain because the stimuli are generated artificially, and a systematic method to manipulate pareidolia-inducing power in comparable images is lacking.The pareidolia-inducing power may differ according to the pareidolia type and form; essentially, only similar stimuli can be compared when estimating the pareidolia-inducing power.
To the best of our knowledge, no study thus far has generated pareidolia stimuli.Therefore, in this study, we systematically generated face pareidolia stimuli using a face data set for which Y. ENDO ET AL.

Face…?
Fig. 1.An example of pareidolia.The actual object in the image is a bag.It can appear as a face because the orange buckles can be regarded as the eyes of the face and the handle can appear as the mouth of the face annotation data were available [10].The annotation data was constructed using the coordinates of face parts.We believe that the annotation data can be used to generate the same form of the face pareidolia stimuli.Using annotation data, the locations of the elements causing pareidolia were clearly determined, thus facilitating the creation of pareidolia stimuli that can manipulate pareidolia position.Further, we employed cycle-consistent adversarial networks (CycleGAN) for translating between unpaired data sets to systematically generate pareidolia.We attempted to generate face pareidolia stimuli by image-style translation from a face image to a natural image.Evidently, the cycle-consistency loss, a parameter of the CycleGAN, was found to affect the performance of the generated image.

Face pareidolia
As described in Section 1, pareido- lia is a psychological tendency experienced by humans globally.The frequency of experiencing pareidolia tends to be higher for the face category [11], and our brain can react to a face-like object in approximately 170 ms [12].Rhesus monkeys also experience face pareidolia [13] thus suggesting that they too can perceive face outlines.Faces are socially-important information in daily life.The frequency of face pareidolia differs based on gender and personality.However, to the best of our knowledge, research on face pareidolia-inducing power is currently lacking.Notably, face pareidolia can be investigated in detail if an indicator of face pareidolia-inducing power is established, such an indicator can quantify the power of pareidolia-inducing power and thus be used to compare different stimuli.The threshold for face pareidolia can be defined based on this indicator.

Image style translation
Image-style translation is commonly performed in computer vision.Approaches using generative adversarial networks (GANs) [14] are well-known.CycleGAN, which can translate between two different image styles [15], is a typical framework for image-style translation using GANs.The CycleGAN framework is illustrated in Fig. 2. CycleGAN comprises two types of networks: discriminators and generators.The function of the generator is to translate the style of the input image, which differs from the original GANs.The Cycle-GAN can mutually translate image styles, such as a combination of zebra and horse, and painted art and realistic pictures.

Face pareidolia generation
We used CycleGAN to generate face pareidolia stimuli.CycleGAN cannot learn specific objects; thus, we can generate face pareidolia from the facial features.We used real-face and natural-image datasets to generate stimuli.Additionally, if we simply apply pixel synthesis, the generation result contains unnatural colors, such as red lips and eyes.In this study, we attempted to generate the pareidolia stimuli that resemble the after-translation style, including the eyes and lips, using training data.Given that our objective was to use face attribute information (such as eyes and mouth), the CelebAMask-HQ dataset was employed herein.The CelebAMask-HQ dataset includes the label information of facial attributes such as eyes, hair, and skin.Additionally, natural images collected from Flickr [16] were also used.Figure 3 shows the exemplar images used for training styles.
Cycle-consistency loss is a loss function of CycleGAN.As shown in Fig. 4, the CycleGAN calculates the difference between the original input image and the image generated to be similar to the original image.The difference regards the cycle-consistency loss and makes the generators learn the after-translation image can be translated to the original input image.CycleGAN calculates the difference between the original input image and generated image, which is related to the cycle-consistency loss, and makes the generators learn.The cycle-consistency loss function reduces the style translation-mapping solutions.Following [15], the overall objective function is shown in (1): Here, L GAN is the loss function of the original generative adversarial networks (GANs) and L cyc is the cycle-consistency loss.The weight of the cycle-consistency loss (λ) affects the performance of the generation.This effect might be associated with pareidolia-inducing power; thus, we trained the CycleGAN on the

Preprocessing for the face image
The original face images include the face, hair, and background.We preprocessed the images to reduce the effect of generation caused by the color difference of each part.Facial contours are often located at the borders of the skin, hair, and background.As described in Subsection 3.1, CycleGAN cannot learn specific objects.Additionally, we confirmed that the translated image changed significantly, with changes in the image color.Therefore, after translation, the image strongly retained the facial contours.To overcome this problem, we applied two preprocessing, namely noise processing and blurring, to the original face data set.As to the noise process, the random RGB noise replace the pixels of the original face images except for the regions of eyes and mouth.As to the blurring process, we utilize a Gaussian blur, with its filter size of 51 × 51, and σ of 8.An image example illustrating the preprocessing steps is shown in Fig. 5. Preprocessing helps reduce the effect of the pareidolia region [17].Because our primary objective was to generate both the stimuli that caused pareidolia and that which did not, we processed the elements that induce pareidolia: the eyes and mouth.The face image dataset contained several images with the face present at the center of the image.Therefore, the participants may have developed a bias toward finding the face from the center of the stimulus.To overcome this problem, we moved the components that induce face pareidolia.

Evaluation of the generated stimuli A standard
psychophysical method for pareidolia-inducing power has not yet been developed.In addition, the frequency and sensitivity of pareidolia differ between individuals.Thus, we examined the experiments conducted previously on pareidolia to design the experimental procedure.In particular, the case of the noise pareidolia test [2] was found to be similar to that of our research objective.The procedure for the noise pareidolia test was as follows.First, the investigator of the noise pareidolia test presented the experimental stimulus to the participants.Next, the participants answered whether the face or a specific object was in the stimulus.If the participants identified the face in the stimulus, the investigator instructed them to point out the facial region.
In this study, we investigated the pareidolia-inducing power of the generated stimuli.Therefore, the participants determined whether the stimulus included pareidolia and we instructed them to evaluate the face pareidolia intensity for the stimulus.

Experiment
We performed two types of experiments and data analyses.The first was a pareidolia stimuli-generation experiment; we generated face pareidolia image stimuli using CycleGAN.The second experiment involved evaluating the generated stimuli.Pareidolia may differ with the sensitivity of pareidolia-inducing power between individuals; therefore, we developed an analytical method to normalize sensitivity.

Face pareidolia generation
We used CycleGAN to generate the images.As described in Subsection 3.3, the training dataset comprised a natural image dataset and preprocessed face image dataset.Each training image dataset comprised 100 images of resolution 256 × 256.The images were randomly selected.The number of learning iterations (epochs) was 1000, and the learning rate for CycleGAN was initially set to 0.0002 until the 900th epoch and linearly decayed to 0 from the 901st to 1000th epochs.Further, the λ values used for training were 2, 10, and 20.The two preprocessing steps described in Subsection 3.2 were applied to the images, which were used as training data.In training, the face images used were not preprocessed in terms of the elements causing pareidolia.CycleGAN cannot learn specific objects; therefore, even if the input image did not contain elements that cause pareidolia, the output image was considered to be slightly affected.

Evaluation experiment 4.2.1. Evaluation procedure
The generated stimuli were evaluated to investigate their systematic generation.For qualitative evaluation, the participants scored the generated images based on all the λ values and preprocessing.We evaluated face strength as face-pareidolia-inducing power.A total of 210 images were evaluated in the experiment.First, we randomly select 15 images that are not used for training from the CelebAMask-HQ dataset as the input images.Then, for each input image, we generate four different stimulus images, which depend on including/excluding the pareidolia elements or using blur/noise preprocessing.In total, we generate 60 images with respect to a value of λ.Finally, 180 stimulus images are evaluated because we investigate three different values of λ.Additionally, we extracted 15 images each from a face and natural images without using them for training the style translation.First, the participants answered whether the face is contained in the displayed stimuli.If the participant reported 'yes', the participant pointed out the face region by enclosing it in an ellipse.Thereafter, the participant scored the pareidolia-inducing power, which ranged from 1 to 99.After the scoring or the participant reporting 'no', the displayed image was changed to the next image, and the evaluation was repeated for all the image stimuli.The monitor used for the experiment was a 21.5-in monitor, BenQ, G2222HDL.We analyzed the result based on the evaluation by each participant.

Participant
We enrolled 11 participants (nine males, and two females, aged between 21 and 24 years).All participants consented to the experiment and signed the relevant agreement.

Data analysis
The results were analyzed using the signal detection theory (SDT) [18].SDT is effective in investigating whether the presented stimuli include specific information.In this study, the specific information is set to the face, and regarded as 'signal'.The natural image is regarded as 'no-signal' because of the exclusion of specific information.Additionally, the case of inclusion of pareidolia elements regards as 'signal' (Fig. 6(b)-(d) and (j)-(l)), and that of exclusion regards as 'no-signal' (Fig. 6(f)-(h) and (n)-(p)).Responses in SDT can be classified ), the response is classified as H, and 'no' is classified as M. Similarly, if the participant reported 'yes' when no signal is present in the displayed stimulus (e.g., Fig. 6(f)-(h) and Fig. 6(n)-(p)), it will be classified as FA, and if the subject reported 'no', it is classified as CR.In SDT, the ability of discriminating the stimuli is parameterized as d , and calculated by (2): Here, Z () calculates the z-score of each probability.Also, P (H ) and P (FA) are the probability of H and FA, respectively.We recorded the response time, the center of the ellipse, major axis, and minor axis.Following the above SDT, the response of participants is classified into four types: M, H, FA, and CR.Based on the recorded ellipse information, this study determined whether the image was intended pareidolia.The mask image corresponding to the face image to which the migration process was applied was moved by the same amount as in the previous method, and the SYSTEMATIC FACE PAREIDOLIA GENERATION METHOD USING CYCLE-CONSISTENT ADVERSARIAL NETWORKS degree of overlap with the ellipse was determined.The pareidolia was considered acceptable when more than 90% was enclosed and the presented stimulus was treated as H.If the overlap ratio was less than 90% and the presented stimulus included pareidolia elements, the presented stimulus was treated as FA.Examples of   the results of the generation experiments are shown in Fig. 6.

Face pareidolia generation result
These images are embedded in the face pareidolia structure and can be perceived as faces.We consider that the same form of pareidolia stimuli can be generated using face annotation data.The face form appears to remain in the image when the input image includes pareidolia elements (Fig. 6(a)-(d) and (i)-(l)).Conversely, the face form does not seem to remain in the image when the input image does not include pareidolia elements (Fig. 6(e)-(h) and (m)-(p)).

Evaluation experiment result
Pareidolia is a psychological tendency; however, whether the stimuli can be perceived as a face differs between individuals.Figures 7 and 8 show the hit transition on each preprocess.
The reported number increased monotonically as the value of λ increased for 7 of the 11 participants on blur preprocessing (Fig. 7).Further, the reported number increased monotonically as the value of λ increased for only 1 of the 11 participants on noise preprocessing (Fig. 8).One participant could not report the real face because of an error in the evaluation experiment.

ROC curve
In the experiment, almost all participants have sometimes reported 'face' on the not-intended pareidolia stimuli, FA.There are two possible causes of factors.One of the reasons is the low ability to discriminate whether the participant can perceive the face.This task itself is too difficult to discriminate objectively, and the d is low.Another reason is the abnormal setting of the internal criterion; the participants tend to set the threshold to report 'face' too low.We calculate the d and draw the receiver operating characteristic (ROC) curve based on Fig. 7. Hit number transition on blur preprocessing.The red and blue lines represent the monotonically and nonmonotonically increasing participant intensity, respectively.The orange bar chart represents the average of the reported number of participants.The error bar represents 95% confidence interval The 'inf' of d means that the participant reports 'face' on the intended pareidolia stimuli perfectly without FA.When the d is 'inf', the line of the ROC curve cannot be drawn.All participants have the ability to discriminate from the ROC curve and d .Therefore, the internal criterion for the face signal is suggested to be abnormal.

Data analysis result
The pareidolia-inducing power scores differed between individuals.Almost all participants evaluated as score: 99 for real facial stimuli.On the other hand, in some cases, the participant cannot perceive the face (score: 0).In this part, this case is not considered because we focus on the perceived case.Minimum score was distributed between participants.Therefore, we applied min-max normalization based on each minimum value to normalize the score range.The formula for processing the minimum value to 1 and maximum value to 99 after normalization is shown in (3).

Intensity =
Score-Intensity min Intensity max -Intensity min × 98 + 1 Here, Intensity min is the minimum value of each participant, and Intensity max is the maximum value of each participant.The result after the min-max normalization process is shown in Figs. 10  and 11.
If the intensity of each participant monotonically increases with respect to the increase of λ, the lines of the according subjects in Figs. 10 and 11 are drawn in red.As shown in Fig. 10, when blurring is applied as the preprocessing for stimuli generation, 8 out of 11 lines show monotonic increase.When the noise process is applied as the preprocessing, all subjects are nonmonotonically increasing, which can be observed from Fig. 11.In addition, we investigate the correlation coefficient between the λ value and the average participant's intensity with respect to each preprocessing method based on Figs. 10 and 11, respectively.When the preprocessing is blurring, the correlation coefficient is 0.92.However, when the preprocessing is noise, the correlation coefficient degrades to 0.59.Based on the correlation coefficient, we can confirm that the strong correlation between the λ value and the average intensity exists in the case of blur preprocessing.On the other hand, we cannot intuitively observe whether a significant difference exists between λ = 10 and λ = 20 in Fig. 10.Therefore, Fig. 10.Blur intensity transition.The red and blue lines represent the monotonically and nonmonotonically increasing participant intensity, respectively.The orange bar chart represents the average of the participants' intensity.The error bar represents 95% confidence interval Fig. 11.Noise intensity transition.The red and blue lines represent monotonically and nonmonotonically increasing participant intensity, respectively we investigate whether there is a significant difference using the Wilcoxon signed-rank test [19].As a result, the test statistic (T ) is 12 under a significance level of 10%.As the number of participants is 11, there is a significant difference when the significance level is set to 10%.This result suggests that when the preprocess was blurring, the weight (λ) was associated with the pareidoliainducing power.In each category, the tendency of the scores was different from that of the faces, thus indicating that the stimuli were generated with different characteristics from those of the faces.Figures 7 and 10 have same intensity trend, which is the same as that in Figs. 8 and 11.The correlation coefficient between the hit number and the average participant's intensity of generated stimuli in the case of blur preprocessing is 0.98, and the correlation coefficient of noise preprocessing on generated stimuli is 1.00.These observations imply that the pareidolia report number might correlate to the pareidolia-inducing power.

Conclusion
Herein, we systematically investigated the generation of facial pareidolia through the pareidolia-inducing power.We manipulated the weight of the cycle-consistency loss and generated stimuli.The pareidolia-inducing power can be manipulated by the cycleconsistency loss of CycleGAN.Herein, we employed a human face data set and generated pareidolia stimuli that had the same form as the face using the annotation data of the face data set.We trained the natural and face images, and systematically generated pareidolia stimuli.The results of the evaluation experiment revealed a correlation between cycle-consistency loss and pareidolia-inducing power when the blurring process was applied as preprocessing.This suggests that preserving features such as eyes and mouth is critical for inducing pareidolia.Also, the ROC curve reveals that the cause of pareidolia is mostly due to internal criterion.In future work, we intend to apply the proposed method to existing pareidolia stimuli.Further, it can also be applied to more versatile applications by systematic generation in the form of existing pareidolia stimuli, except for faces.For example, opposite tendencies of pareidolia, prosopagnosia, are known [20].Some people cannot experience pareidolia even if pareidolia-inducing power is strong.In such case, prosopagnosia may have occurred.In this study, healthy controls were tested.In future work, we must investigate the correlation between pareidolia-inducing power and pareidolia reported number, and extend our experiment to patients suffering from not only Lewy body dementia but also prosopagnosia.

Fig. 2 .
Fig. 2. The CycleGAN framework on data sets A and B .The G A→B aims to translate from the A style to the B style.The G B →A behave similarly from data set B style to data set A style.The D A aims to determine whether the input image is from the data set A or generated by the G B →A .The D B behave likewise on data set B

Fig. 3 .Fig. 4 .
Fig. 3. Examples of natural image for style training.The above picture: public domain.The below picture: 'Fairyland Mesclun Mix, 2008' by Brian Boucheron is licensed under CC BY 2.0