Volume 101, Issue 6 p. 67-79
RESEARCH ARTICLE
Free Access

Privacy‐aware face detection using biological signals in camera images

Toshihiro Kitajima

Corresponding Author

Samsung R&D Institute, Japan

Graduate School of Engineering Science, Osaka University, Suita, Japan

Correspondence

Toshihiro Kitajima, Samsung R&D Institute, Japan.

Email: t.kitajima@samsung.com

Search for more papers by this author
Shunsuke Yoshimoto

Graduate School of Engineering Science, Osaka University, Suita, Japan

Search for more papers by this author
Yoshihiro Kuroda

Graduate School of Engineering Science, Osaka University, Suita, Japan

Search for more papers by this author
Osamu Oshiro

Graduate School of Engineering Science, Osaka University, Suita, Japan

Search for more papers by this author
First published: 24 April 2018
Citations: 1

Abstract

People sensing occurs everywhere, in the Internet of Things (IoT). Cameras are being increasingly used because they provide inexpensive and effective sensing devices. However, the camera acquires the information that identifies an individual, there is a problem that the privacy of the person is invaded. Furthermore, since home appliances are increasingly being connected to the Internet via the IoT, it has become possible for user images to leak out unintentionally. With these concerns in mind, we propose a face detection method that protects user privacy by using intentionally blurred images. In this method, the presence of a human being is determined by dividing an image into several regions and then calculating the heart rate detected in each region. In our performance evaluation, the proposed method showed dominant performance results when compared with an existing face detection method, and was confirmed to be an effective method for detecting faces in both normal and blurred images. We confirmed the influence of the performance in the proposed method when changing the sharpness of the images. The proposed method also showed high accuracy position detection performance results. Furthermore, we confirmed the effectiveness of the proposed method in near‐infrared images and distorted images.

1 INTRODUCTION

IoT (Internet of Things) age is here—cars, trains, airplanes and other vehicles, TV sets, refrigerators and other home appliances, medical devices and drugs, and many other things connect via the Internet to exchange diverse numerous data without human intervention. Home electric appliances around us are increasingly provided with various sensors that exchange data to improve services.1-4 In so doing, services should be tailored to user's conditions, and human tracking in home environment gains importance. With the recent reduction in camera costs, cameras are increasingly used for human detection. Cameras acquire much information, thus being useful detection devices; however, this information includes human faces, clothing, meals, and other private items, which poses the problem of invading personal privacy. With the advent of IoT age, electric appliances connect to the Internet so that captured users’ images may unintentionally leak out (Figure 1). This causes concern in some users who keep from purchasing camera‐equipped products. Therefore, it is desirable to develop devices capable of privacy‐aware human detection.

image
Leakage of user privacy via Internet and camera [Color figure can be viewed at wileyonlinelibrary.com]

There exist sensors that can detect people without using camera images. These are motion sensors such as pyroelectric sensors5-8 and thermal sensors.9-11 Such sensors can find out whether there are people within their sensing range, but cannot identify people's direction or position. There are also human detection methods using thermal images,12-15 but special elements and lenses are needed to acquire thermal images, which results in high costs. This study aims at privacy‐aware detection of human presence, as well as measurement of direction and position, using a camera. As regards privacy protection, there are reports on employing software to blur human images after they are acquired16, 17 and replacing facial region with CG.18 There are also methods to protect images with an authentication key.19 However, original camera images may be hacked, and in terms of safety, privacy must be protected at the stage of image acquisition.

Many existing methods of camera image processing support human detection. Some typical techniques are face detection20 using Haar‐like features and AdaBoost, and human detection using HOG feature.22 These methods work well on sharp images, but contrast and edge features cannot be extracted in insufficiently sharp images, so that human detection proves impossible. There are face detection and recognition methods that are reported to support washed‐out images.23-26 These methods work on images blurred to some degree; however, there is no experimental proof of their applicability to faint images where eyes, nose, mouth and other parts are unrecognizable, or people cannot be identified.

This study is based on a different approach. That is, biological signals obtained from images are used instead of visual features. Specifically, an image is divided into multiple regions, and heart rate is detected in each region; face presence is recognized in regions where detected values correspond to human heart rate. We have already reported about the underlying principle.27 In this study, we further develop the previous report,27 and verify effectiveness of the proposed method by comparing it with an existing face detection method in case of images with graded sharpness. In addition, we propose a method to accurately detect and measure face position. In the proposed method, face can be detected not only by visible‐light cameras but also by near‐infrared cameras. One can assume the use of near‐infrared cameras in dim environments; thus we evaluated performance for near‐infrared images. Moreover, the proposed method does not require face shape information, and therefore, can be applied to images distorted by a fisheye lens or a curved mirror. Embedded devices are not necessarily provided with memory for distortion correction, and ability of face detection in distorted images may be helpful. In this context, we evaluate face detection rate of the proposed method in case of distorted images.

2 PRINCIPLE

Here we explain how heart rate can be measured from human skin. Heart rate is an indicator of arterial pulsation; arterial blood contains a large amount of oxygenated hemoglobin. Absorption spectrum of oxygenated hemoglobin is shown in Figure 2.28 Oxygenated hemoglobin absorbs much light at a wavelength of 500 to 580 nm. As shown in Figure 3,29 the range of 500 to 580 nm corresponds to green and yellow. Particularly, yellow pertains to 560 to 580 nm, and green pertains to 490 to 560 nm; that is, green covers a wider wavelength range, and more green light is absorbed by oxygenated hemoglobin. Commercially available pulse wave sensors of finger type,31 wristwatch type,32 and other types measure heart rate using the property of oxygenated hemoglobin in blood to absorb green light. The principle of measuring heart rate using green light is illustrated in Figure 4.30 A device is provided with a green LED and a photodetector. The LED illuminates human skin with green light, and the photodetector receives reflected wave. Blood volume changes with pulsation, and light received by the photodetector has a waveform such as shown on the right of Figure 4. Heart rate is derived from this waveform. Besides, the absorption spectrum is also strong in the near‐infrared region of 800 to 1000 nm, and near‐infrared light can be employed for heart rate measurement as well.

image
Absorption spectrum of oxygenated hemoglobin28 [Color figure can be viewed at wileyonlinelibrary.com]
image
Correspondence between wavelength and color29 [Color figure can be viewed at wileyonlinelibrary.com]
image
Pulse wave sensor principle30 [Color figure can be viewed at wileyonlinelibrary.com]

This mechanism is applied to face images. With conventional face detection, face region is detected as shown in Figure 5. Because both room lighting and sunlight include green light, an RGB camera is used to extract G (green). Time series data of G in a face region are shown in Figure 6. These data contain a pulsating component that corresponds to heart rate. Heart rate can be calculated by normalizing these data, conducting frequency analysis, and detecting a peak as shown by the dashed circle in Figure 7.

image
Face detection region [Color figure can be viewed at wileyonlinelibrary.com]
image
Heart rate signal extracted from face region time series data
image
FFT calculation of heart rate signal showing detected peak frequency [Color figure can be viewed at wileyonlinelibrary.com]

3 METHOD

In this study, we assume face detection of a user watching TV. Due to sensing, power can be turned off automatically when the user is no longer sitting in front of the TV, thus saving energy. A user moving in and out can be tracked using frame‐to‐frame motion information; however, a user watching TV does not move (especially, when the user is keen about what happens on screen), and user's presence is hard to detect. Naked‐eye 3D technologies are developed for future TV. With existing naked‐eye 3D, view angle is narrow, and viewing position is restricted; in this context, methods are developed to change displayed picture according to user's position with respect to TV set. This study aims at presence detection and position measurement of a user who is quietly watching TV. In future, we are going to propose a method that supports user's motion as well as multiple users, but in this study, we focus on detection and location of a static user as the most difficult case.

3.1 Face detection with blurred images

In conventional human detection based on camera images, visual information (appearance) is used, which requires sharp images. However, acquisition of sharp images making possible personal identification is not allowed in terms of user's privacy protection; the method proposed in this study does not involve appearance information.

There are a number of reports on measuring heart rate from camera images.34-36 With such methods, face region is extracted, and heart rate is measured in the extracted region. RGB brightness values in face region are arranged in respective data series to be treated as signals. Independent component analysis or principal component analysis is applied to these three signals in order to estimate heart rate. We reported a method that does not use the three signals but accurately estimates heart rate from a single signal.37, 38 The use of a single signal is distinctive in that heart rate can be estimated in the same way using both a visible‐light camera and a near‐infrared camera. In this study, we use the previously proposed method of heart rate estimation for face detection. Camera images are divided into regions, heart rate is estimated in each region, and a face is recognized in regions where estimated values match human heart rate.

First, to acquire privacy‐protected images, image sharpness is degraded so that the face cannot be identified; this is achieved by placing a frosted glass in front of the camera or by defocusing the lens, and so on. When taken by a usual camera, an image such as shown in Figure 8 is acquired; however, a blurred image such as shown in Figure 9 can be obtained by defocusing. Such image can be divided into multiple regions as shown in Figure 10 (30 regions in this case). Heart rate can be then calculated in each region.

image
Normal camera image [Color figure can be viewed at wileyonlinelibrary.com]
image
Out‐of‐focus image [Color figure can be viewed at wileyonlinelibrary.com]
image
Image divided into several small regions [Color figure can be viewed at wileyonlinelibrary.com]

In Figure 11, A and B pertain to a background region and a human region, respectively. The graphs in Figures 12 and 13 present 30 s average data of G brightness values of the two regions extracted by an RGB camera. In the background region (Figure 12), green brightness value is almost unchanged; in contrast, in the human region (Figure 13), green brightness value shows a pulsating waveform corresponding to heart rate. Fast Fourier transform (FFT) is performed on these green brightness values. In so doing, a rectangular time window is set backward from the current data. Results of FFT for the background region and the human region are shown in Figures 14 and 15, respectively. A decision is made whether human heart rate can be detected from these power spectra. A range of heart rate is set so that values from which human heart rate cannot be derived fall outside the search scope. Particularly, the range of human heart rate is set to 50 to 120 bpm. This method does not apply to fast motion or illness, and so on. but assumes everyday indoor activities. The gray areas in Figures 14 and 15 are regions where heart rate is below 50 bpm, thus being outside the search scope. Besides, when power spectra are weak, signals are likely to be mistaken for human heartbeat; thus, a threshold is imposed, and power below the threshold is discarded. If the threshold is exceeded at several points, then the frequency offering the highest power is adopted as heart rate. In Figure 14, power does not exceed the threshold at any frequency, and therefore, the region is recognized as background. In Figure 15, the threshold is exceeded at a frequency around 0.95 Hz; this gives a heart rate of 60 × 0.95 = 57, and the region is recognized as human.

image
Background region (A) and human region (B) [Color figure can be viewed at wileyonlinelibrary.com]
image
Green light intensity average of pixels inside background region (A) [Color figure can be viewed at wileyonlinelibrary.com]
image
Green light intensity average of pixels inside the human region (B)
image
FFT calculation of signal in background region (A); dotted line represents threshold value [Color figure can be viewed at wileyonlinelibrary.com]
image
FFT calculation of signal in human region (B); light gray area is outside scope of search [Color figure can be viewed at wileyonlinelibrary.com]

When this processing is applied to all regions of an image, results such as shown in Figure 16 are obtained. In background regions where power did not exceed the threshold, heart rate is marked as 0. In Figure 16, heart rate of 57 is detected only in the human face region, and one can assume human presence in that region. Thus, face can be detected by dividing an image into regions, calculating heart rate in each region, and recognizing human presence in regions where heart rate is detected in the range of 50 to 120 bpm set for human heart rate.

image
Heart rate detection results for each region [Color figure can be viewed at wileyonlinelibrary.com]

3.2 Face position detection

Position cannot be determined using only the face detection method in Section 3.1; thus, here we explain about face position estimation based on the detection results of Section 3.1. These results are regions which are recognized as human based on heart rate calculation. Such results are presented in Figure 17; here the rectangles show regions in which heart rate is detected, and the numbers pertain to the heart rate values. At this stage, the center of detected rectangles might be considered as face position; however, depending on how finely the original image is fragmented, the detected rectangles are not necessarily symmetric with respect to face, or only one rectangle is obtained, and so on. Therefore, using the center of detected rectangles as it is would be inaccurate. We improve position accuracy through search of overlapped regions around the center of multiple detected regions.

image
Face detection results by proposed method; rectangles represent areas of human being, and numbers in rectangles represent heart rates [Color figure can be viewed at wileyonlinelibrary.com]

Multiple rectangles may appear in an image; thus, neighbor rectangles are labeled as one cluster. If multiple clusters exist, then the largest labeled area is selected. Centroid position of this labeled area is adopted as a reference position. Heart rate is detected in regions around the reference position. In so doing, rectangle centroid position (x, y) and rectangle side size w are treated as variable parameters. Rectangle centroid position (x, y) is set around the reference position, and rectangle side size w is varied with respect to the size of image regions in Section 3.1. Such heart rate detection in neighbor rectangles is illustrated in Figure 18. The procedures described in Section 3.1 are applied to these rectangles, and voting in two‐dimensional space is performed on regions in which FFT power spectra exceeded the threshold, and heart rate was detected. Here voting space means a space with the same resolution as in the input image. Voting is applied to all pixels in the regions where heart rate was detected. The voting space for multiple rectangles in Figure 18 is shown in Figure 19. Here brightness represents the number of votes; the white areas are pixels with many votes. Only regions with the number of votes exceeding a certain value are extracted, and centroid of these areas is defined as human position.

image
Search area for face position detection based on face detection results [Color figure can be viewed at wileyonlinelibrary.com]
image
Voting space of same size as input image; voting is performed in area where heart rate is detected; brightness represents number of votes

4 PERFORMANCE EVALUATION

4.1 Accuracy of face detection at varied sharpness

4.1.1 Experimental environment

We evaluated performance of face detection described in Section 3.2 at varied image sharpness. In so doing, we used a camera with a built‐in motor and software‐controlled lens.39 Appearance of the camera is shown in Figure 20. In this camera, lens position can be controlled at 1000 stages. Images captured by the camera with VGA resolution at 30 fps were fed to a PC via USB 2.0 for processing. Specification of the camera used in experiments is given in Table 1. The only lighting was fluorescent lamps placed on the ceiling.

image
Focus control camera39 [Color figure can be viewed at wileyonlinelibrary.com]
Table 1. Focus control camera specification
Type RGB
Resolution (pixel) 640 × 480
Frame Rate (fps) 30
Gradation (bit) 8 (1ch)
Focus control resolution (level) 1000

The camera was used to take pictures of a man quietly sitting at a distance of 1 m. In this experiment, images were acquired only under such static conditions. Using the focus control function of the camera, the lens position was moved towards the subject at an increment of 0.3 mm. Examples of thus acquired images are given in Figures 21 to 24. Figure 21 presents a focused image; in Figures 22 to 24, the lens is shifted by 0.3, 0.6, and 0.9 mm, respectively, forward from its focus position. When using the proposed method, VGA images of 640 × 480 pixel were divided into 48 regions of 80 × 80 pixel. Images were acquired for 1 min, 30 s. FFT results were not outputted in the first 30 s, and detection rate in this period was not considered; evaluation of detection rate was applied to 1800 images acquired in the remaining 1 min. Five subjects were used for evaluation. This experiment involving students of Graduate School of Engineering Science was approved by the Osaka University Ethics Committee; after being explained about the contents of experiments, the subjects signed a written consent.

image
Focused image [Color figure can be viewed at wileyonlinelibrary.com]
image
Image with 0.3 mm shift in focus [Color figure can be viewed at wileyonlinelibrary.com]
image
Image with 0.6 mm shift in focus [Color figure can be viewed at wileyonlinelibrary.com]
image
Image with 0.9 mm shift in focus [Color figure can be viewed at wileyonlinelibrary.com]

The proposed method was compared with a conventional method, namely, a typical face detection method20 using Haar‐like features and AdaBoost. The method using Haar‐like features and AdaBoost was implemented in OpenCV (Version 2.4.3). In this method (below referred to as face detection in OpenCV), the size of search subwindow was set to 40 × 40 pixel, and the scaling factor was set to 1.1.

With face detection in OpenCV, MIT face dataset21 was used in learning. Particularly, 2600 face images were used as correct images, and 5200 images were used as incorrect (background) images. Images with the lens shifted by 0.3, 0.6, and 0.9 mm are blurred images; thus, pseudo‐blurred images were used for learning as well. MIT face dataset images were smoothed by a Gaussian filter to generate pseudo‐blurred images. Degree of blurriness was varied using Gaussian filters of different size to obtain learning images corresponding to each lens position. With face detection in OpenCV, a total of four face detectors were used for each lens position.

Detection criteria are as follows. First, true face regions in acquired images are set visually. If rectangular face regions obtained by face detection overlap to any degree with true regions, then detection is considered correct. If no regions overlap, then detection is recognized as false. In our experiment, only one person exists in one frame; therefore, each frame is considered as one trial. If multiple rectangles detected as a face region exist in one frame, and all these rectangles show correct detection, then results are considered duplicative, and only one detection is recognized as correct. In case of false detection, too, only one detection is recognized as false when multiple false detections appear in one frame.

4.1.2 Experimental results

Face detection results are represented by ROC (Receiver Operating Characteristic) curves.40, 41 ROC curve plots true positives on the vertical axis, and false positives on the horizontal axis. Such curves can be generated by changing detection threshold. Particularly, we changed FFT peak threshold in the proposed method, and threshold of overlap count in face detection in OpenCV. With a ROC curve, the closer is the plot to the upper left corner, the higher is correct detection rate, and the lower is false detection rate, that is, the better is detection performance. Usually, when one tunes a detection method so as to improve true positive rate, false positive rate grows as well. Conversely, when parameters are tuned so as to suppress false positives, true positives decline as well. Therefore, when detection methods are compared using one parameter, such comparison does not apply to overall performance; on the other hand, ROC curves make possible comparing overall performance. ROC curves are employed for performance evaluation of human detection methods,42, 43 object detection methods,42, 45 and so on.

Results of face detection by the proposed method are presented in Figures 25 to 28. The results shown in Figure 25 pertain to in‐focus images. In the proposed method, ROC curve near the upper left corner offers high performance with true positive rate of 93.0% and false positive rate of 0.1%. Face detection in OpenCV shows true positive rate of 93.0% and false positive rate of 0.1%. Results of Figure 25 indicate that the proposed method is able of face detection in normal (in‐focus) images. The results shown in Figure 26 pertain to an image with lens shift of 0.3 mm. The proposed method offers high performance near the upper left corner with true positive rate of 99.8% and false positive rate of 0.1%. Performance of face detection in OpenCV declines dramatically to true positive rate of 7.5% and false positive rate of 7.1%. Face detection in OpenCV uses image brightness difference, and one can assume that face features cannot be well learned in images with low sharpness so that face cannot be extracted. The results shown in Figure 27 pertain to an image with lens shift of 0.6 mm. In the proposed method, the upper left corner portion of ROC curve disappears; that is, high true positive rate is accompanied with high false positive rate. At the upper left point, true positive rate is 74.5% and false positive rate is 0.1%. Face detection in OpenCV shows low performance with true positive rate of 4.8% and false positive rate of 26.3%. The results shown in Figure 28 pertain to an image with lens shift of 0.9 mm. In this case, overall detection rate of the proposed method does not improve. This means that performance of the proposed method declines when lens shift is increased so that image sharpness drops excessively. As regards face detection in OpenCV, performance declines, and the best true positive rate reaches just 6.6% at false positive rate of 36.8%.

image
ROC curve results of images with 0.0 mm shift in focus
image
ROC curve results of images with 0.3 mm shift in focus
image
ROC curve results of images with 0.6 mm shift in focus
image
ROC curve results of images with 0.9 mm shift in focus

These results demonstrated that the proposed method supports face detection in images with sharpness intentionally degraded by shifting lens position. In addition, the experiments confirmed that detection performance in images with degraded sharpness greatly declined in case of a conventional method (face detection in Open CV). Moreover, the proposed method proved operational also for normal (in‐focus) images. Detection performance of the proposed method declined as the lens shift was increased. This means that though the proposed method of face detection works better than the conventional method on blurred images, there is an appropriate scope of blurriness, which is extremely useful in development of face detection systems. Images with lens shift of 0.3, 0.6, and 0.9 mm used in the experiments are not sufficiently sharp for personal identification, thus ensuring protection of user privacy.

4.2 Accuracy of face position detection

4.2.1 Experimental environment

We measured position accuracy of the method described in Section 3.2. The experimental environment was basically same as in Section 4.1. The camera shown in Figure 20 was used to take pictures of a man quietly sitting at a distance of 1 m. Measurement performed on five subjects was approved by the Ethics Committee. In order to verify position accuracy in these experiments, subject's face was photographed at 0°, 10°, and 20° with respect to the camera (here frontal position with respect to the camera is defined as 0°). Images were acquired for 1 min, 30 s for each angular position; the first period of 30 s was not considered, and evaluation of position accuracy was applied to 1800 images acquired in the remaining 1 min. Shift of the camera lens was fixed at 0.3 mm for all images. Face center position defined as the nose center position was inputted manually as true position.

4.2.2 Experimental results

We measured errors as the difference between face center position obtained by the method described in Section 3.2 and its true value. Thus obtained results for position accuracy are presented in Figure 29. Face center position is obtained as a pixel position in an image, and the difference from true value is first derived in pixels; however, for sake of generality, errors are converted to angular values using relationship between angle of view and resolution. The vertical axis in Figure 29 plots deviation from true position; the white squares pertain to results using the detected rectangles (see Section 3.1) as they are. Results using the proposed correction as explained in Section 3.2 are shown by the black circles. The black circles and white squares show mean values, while respective standard deviations are expressed by the error bars. As can be seen from Figure 29, position in both horizontal and vertical directions is corrected by the proposed method; besides, standard deviation (representing spread) decreases. These results confirm effectiveness of the method in Section 3.2. With angle correction using this method, mean error/standard deviation in horizontal and vertical directions are 0.26°/0.48° and 0.51°/0.62°, respectively, which is indicative of high accuracy. Thus, we confirmed that with the proposed method, human position can be detected at high accuracy from camera images, while protecting user privacy.

image
Average errors and standard deviations of position accuracy

4.3 Accuracy of face detection with near‐infrared light

4.3.1 Experimental environment

In the proposed method, face can be detected not only by visible‐light cameras but also by near‐infrared cameras. Face detection described in Section 3.1 can be implemented when G brightness signal of a color camera (the input data) is replaced with brightness signal of a near‐infrared camera. In the experiments we used Kinect for Windows v2 as a near‐infrared camera. Images were taken with near‐infrared wavelength of 830 nm and resolution of 512 × 424 at 30 fps. Pictures of five subjects quietly sitting at a distance of 1 m were taken. In so doing, a cyclone filter shown in Figure 3046 was placed in front of the camera lens so that the subjects cannot be identified. When passing the cyclone filter, light is diffused, and image sharpness declines. A near‐infrared image taken using the cyclone filter is shown in Figure 31. Images were acquired for 1 min, 30 s for each angular position; the first period of 30 s was not considered, and evaluation of position accuracy was applied to 1800 images acquired in the remaining 1 min.

image
Cyclone filter [Color figure can be viewed at wileyonlinelibrary.com]
image
Near‐infrared image with cyclone filter

4.3.2 Experimental results

Results of face detection experiments using the near‐infrared camera are presented as a ROC curve in Figure 32. The proposed method offered true positive rate of 81.8% and false positive rate of 0.0% near the upper left corner. Thus we confirmed that with the proposed method, face detection is possible even in blurred images acquired by a near‐infrared camera provided with an optical filter. Images obtained with a shifted lens position are blurred uniformly; however, when using an optical filter such as cyclone filter, images are seen differently depending on the subject's position. With existing methods of face detection,20, 23-26 learning must be arranged according to subject's pixel positions, that is, multiple face detectors are needed, which complicates processing. On the other hand, with the proposed method, subject's position need not be considered, which provides a useful tool for images in which user privacy is protected by using an inhomogeneous optical filter. Besides, detection performance declines as compared to visible‐light camera with a lens shift of 0.0 or 0.3 mm, but this can be attributed to the influence of light diffusion due to the cyclone filter, and difference in amount of light absorbed by oxygenated hemoglobin. As shown in Figure 2, absorption of green and yellow light at 500 to 580 nm is stronger than that of near‐infrared light around 830 nm. This may explain the difference in detection rate.

image
ROC curve results of near‐infrared images with cyclone filter

4.4 Accuracy of face detection in distorted images

4.4.1 Experimental environment

The proposed method does not require face shape information, and therefore, can be applied to distorted images. In order to verify this ability, we evaluated face detection rate in distorted images. Specifically, we acquired images using the camera in Figure 20 with the standard lens replaced by a 220° fisheye lens. Figure 33 shows an image obtained with the fisheye lens and intentionally blurred by shifting the lens position. Here the camera center is directed at the laboratory ceiling. A subject's face shows up on the right of the image, but the face shape is distorted by the lens. As with the previous experiments, pictures of five subjects quietly sitting at a distance of 1 m were taken. Images were acquired for 1 min, 30 s for each angular position; the first period of 30 s was not considered, and evaluation of position accuracy was applied to 1800 images acquired in the remaining 1 min.

image
Blurred image with fisheye lens [Color figure can be viewed at wileyonlinelibrary.com]

4.4.2 Experimental results

Results of face detection experiments with distorted images are presented as a ROC curve in Figure 34. The proposed method offered true positive rate of 96.8% and false positive rate of 9.5% near the upper left corner. Thus we confirmed that the proposed method can detect deformed faces in images. When existing methods of face detection20, 23-26 are used for distorted images, distortions must be corrected prior to face detection; however, TVs and other devices are not provided with sufficient memory to deploy corrected images. The proposed method would be an effective means for such devices with limited memory.

image
ROC curve results of blurred images with fisheye lens

5 CONCLUSION

We proposed a method for privacy‐aware face detection. We used blurred images in which a person cannot be identified, divided the images into regions, and derived heart rate in each region to detect human presence. In performance evaluation, we used images with graded sharpness to compare the proposed method with a conventional method; the proposed method showed better performance. Thus we confirmed that the proposed method is an efficient means for face detection in images, while paying regard to privacy. We also confirmed that face detection performance of the proposed method declines as image sharpness deteriorates. Besides, we confirmed that the both proposed method and conventional methods can detect faces in normal (in‐focus) images. In addition to detection humans from privacy‐protected images, we proposed a method for accurate position detection. Measured results confirmed that position accuracy remarkably improves after correction. Furthermore, we evaluated performance of the proposed method for near‐infrared images and distorted images to confirm that the method is useful for dim environments and embedded devices. The proposed method makes it possible to detect humans and to accurately estimate their positions, while protecting privacy. However, at the current stage, detection only applies to a single nonmoving human, and in future, the proposed method must be extended to support multiple moving humans.

Biographies

  • image

    Toshihiro Kitajima (nonmember) completed 1st term of doctorate at Nara Institute of Science and Technology in 2004 (Grad. School of Information Sci.), and was employed by Honda R&D Co., Ltd., since 2013 Samsung R&D Institute Japan. 2015 started doctorate at Osaka University (Grad. School of Eng. Sci.). R&D of TV systems. Membership: RSJ.

  • image

    Edwardo Arata Y. Murakami (nonmember) completed doctorate at Tokyo Institute of Technology in 2012 (Interdisciplinary Grad. School of Sci. and Eng.). Doctor of Sci. and Eng. 2005 employed by AIST Digital Human Research Center, 2008 part‐time lecturer at Kitasato University (College of Liberal Arts and Sciences), since 2011 senior researcher at Samsung R&D Institute Japan. Research in biological information, motion control, and human‐machine interface. Membership: ACM, HIS, and others.

  • image

    Shunsuke Yoshimoto (nonmember) completed doctorate at Osaka University in 2012 (Grad. School of Eng. Sci.), and was employed by the University as assistant (Grad. School of Eng. Sci.). Research in haptics, biological measurement. Doctor of Eng. Membership: IEEE, JSMBE, VRSJ, IEICE, and others.

  • image

    Yoshihiro Kuroda (nonmember) graduated from Kyoto University in 2005 (Grad. School of Informatics), and was employed by the University as assistant (Grad. School of Medicine), 2006 assistant professor at Osaka University (Grad. School of Eng. Sci.), 2013 adjunct professor (Cyber Media Center), since 2016 professor (Grad. School of Eng. Sci.). Research in medical VR, tactile information processing, educational training systems. Doctor of Informatics. Membership: ACM, IEEE, VRSJ, JSMBE, JSMVR, and others.

  • image

    Osamu Oshiro (nonmember) completed doctorate at Osaka University in 1990 (Grad. School of Eng. Sci.), and was employed by Sumitomo Metal Industries, Ltd., 1993 assistant at Nara Institute of Science and Technology (Grad. School of Information Sci.), 1994 assistant professor, since 2003 professor at Osaka University (Grad. School of Eng. Sci.). Research in medical image processing, biological signal processing. Doctor of Eng. Membership: JSMBE, SICE, IEICE, and others.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.