Recent Progress of Optical Imaging Approaches for Noncontact Physiological Signal Measurement: A Review

In recent years, optical imaging techniques have gained wide recognition for the measurement of vital signals, such as heart rate, respiratory rate, oxygen saturation, and blood pressure, which are crucial indicators for evaluating human health conditions in clinical examinations. There is a wide range of optical imaging methods for remote physiological signal monitoring, including RGB imaging, thermal imaging, hyperspectral imaging, depth imaging, and multimodal imaging, which provide spatial information compared to other noncontact measurement approaches, thereby allowing extensive applications in this area. In this survey, some fundamental knowledge about optical imaging methods for vital signal measurement is reviewed, including principles of various optical imaging techniques, processing methods for data analysis, discussion on advantages and disadvantages, application summary, and future prospects. This is a comprehensive overview of the noncontact physiological signal measurement of optical imaging approaches.

physiological signal measurement.Furthermore, the addition of spatial dimension allows the simultaneous measurement of the physiological signal and other useful information such as expression and pose estimations. [4]Several recent reviews [5][6][7][8][9][127][128][129][130][131][132][133][134][135] discuss the latest advances in non-contact physiological signal measurement. The aim othis study is to provide a thorough overview of recent optical imaging developments in contactless physiological signal measurement based on machine learning, including fundamentals and applications of optical imaging techniques.Figure 1 shows a wide range of contact and noncontact techniques used for physiological signal measurements.
Figure 1.Different measurement approaches to physiological signals: a) thoracic and abdominal belts of SOMNOlab 2; b) ECG monitor; c) physiological signal from speech; d) Doppler radar; e) laser Doppler vibrometer; f ) ultrasonic proximity sensor; g) RGB imaging; h) hyperspectral imaging; i) Microsoft Kinect sensor (depth camera using structured light); j) multimodal imaging; k) thermal imaging.Group in pink are contact measurements; group in purple are noncontact measurements; group in orange is an acoustic measurement; group in green are nonvision measurements; group in blue are vision-based measurements.

Fundamentals of Optical Imaging Techniques
The optical sensor systems for physiological signal measurement are summarized in Table 1.

RGB Imaging
RGB imaging is commonly used to detect respiration, heart rate, and blood oxygen.The principle of RGB imaging for breathing monitoring is relatively simple, mainly to monitor the changes in the movement range of body parts such as the chest, abdomen, or shoulder.The main difficulty of RGB imaging for breathing measurement tasks is to suppress the noise caused by motion artifacts.
With respect to the RGB camera for HR estimation, the RGB camera captures three components of the reflected and diffused transmitted light portion.The recorded light is mainly composed of three parts: the first parts consist of the steady tissue reflections, transmissions, and interactions of the epidermal layer; the second parts are transmissions and interactions of artery and vein blood vessels; the third parts are ambient illuminations.The possible light paths of incident light in the simplified hierarchical model of upper human tissue layers are described in detail in Figure 2. In artery and vein blood vessels, owing to the absorption properties of hemoglobin, the majority of transmitted lights are absorbed.A little transmitted and interacted light comes back to the external environment and is recorded by the RGB camera.The behavior of transmitted and interacted light (also termed internal reflection, subsurface reflection, or backscattering) is often modeled by Lambert-Beer law. [10]13][14][15][16][17][18][19][20][21][22][23][24][25] The purpose of the physiological extraction algorithm for the RGB imaging technique is to separate physiological signals from surviving light signals using color information.Most researchers commonly use independent component analysis (ICA) or principal component analysis (PCA) for the recovery of physiological signals.The physiological signals extracted for color information are very low in amplitude, so a great many video processing methods have been proposed to solve this problem, such as the Eulerian video magnification (EVM) [26] method.Based on the measurement principle, lighting conditions, image background, participant skin tone, and participant motion may influence the performance of the RGB imaging technique for physiological signal measurement.The extraction of targeted signals is challenging, for the reason that the surviving light signals contain information from the external environment and every part of upper human tissue.

Thermal Imaging
Thermal imager detects radiation in the long-infrared region roughly ranging from 9000 to 14 000 nm, and the radiation in this electromagnetic spectrum is emitted by every object whose temperature is above absolute zero.The thermal camera operates in a passive way, and it can collect the emitted energy from the objects without any external stimulation such as harmful radiation and illumination.Nowadays, the thermal imager has been considered an upcoming and promising tool in the field of psychophysiology.Physiological changes in human beings and other warm-blooded animals are capable of being remotely monitored by thermal imaging in different application scenarios.
The thermal imager is also able to estimate heart rate (HR) at a distance via the analysis of temperature changes in the shallow surface of the skin.When the heart contracts during the ventricular systole, the pressure wave is yielded and propagates through the arterial system.This pulsating blood flow causes the strong temperature variation of the superficial vessels.29][30][31] The change in body surface temperature is the result of heat exchange by convection and conduction of blood vessels and the surrounding tissues in blood flow under the body surface.To resolve heart-related information such as heart rate, we established the bioheat transfer model.Usually, in the bioheat transfer model, the surface tissue is divided into four layers, namely, the skin layer, the fat layer, the muscle layer, and the core layer.The blood vessels in the muscle layer are generally considered to be heat sources in the four layers.Therefore, this bioheat transfer model can be further simplified shown in Figure 3. Based on the surface heat balance model obtained by the bioheat transfer model, we can achieve noncontact heart rate measurements.However, the temperature changes in the ambient environment have a negative effect on the thermal imaging physiological system.

Hyperspectral Imaging
Hyperspectral imaging, known as imaging spectroscopy, is an effective tool to acquire images with an abundance of contiguous spectra from visible to infrared wavelengths in different spatial and temporal resolutions.A hyperspectral image is composed of two spatial dimensions and one spectral dimension, namely, a 3D hyperspectral data cube, in which the location of pixels can be presented in spatial dimensions and the wavelength within image pixels can be presented in spectral dimension.Each pixel in the 3D hyperspectral data cube represents the spatial distribution of the spectral intensity over a given band of wavelengths.34] The blood chromophores within skin tissue, such as hemoglobin and melanin, can be distinguished by hyperspectral imaging.Since hyperspectral imaging systems obtain images in different spectra, it requires longer exposure times.Especially for line scan hyperspectral imaging cameras, longer exposure times are required.Therefore, the quality of hyperspectral imaging is highly influenced by motion artifacts.Based on these subchannel images, the variation of the blood absorption information in the skin can be extracted to monitor the heart rate cycle.Consequently, hyperspectral imaging offers attractive potential for physiological measurement.Although hyperspectral imaging enjoys great popularity in physiological signal measurement, (2014) [13] BR, HR, OS, BR, and HR maps Auto-regressive modeling method; Detrend and filter algorithm; Downsample method; Pole selection 46 patients; R 2 ¼ 0.64 for OS; MAE = 3 bpm for HR Ozana et al.
(2015) [14] PR, HR, OS, PP, BC Laser speckle technique; The analysis of correlation and amplitude of the 2D speckle pattern over time; Cubic-Hermite interpolating polynomial; Bernoulli equation The established system is able to monitor the BR, HR, OS, PP, and BC in a long spatial distance Addison et al.
(2018) [19] HRV Five-band camera (RGBCO: red, green, blue, cyan, orange); Welch FFT method; Facial feature points-based face detection algorithm; Improved peak detection within the BVP signal Videos of subjects are recorded with no large head motions and under environment with no changes in illumination; The proposed method provides cleaner high-frequency observations (0.15-0.40 Hz) and the results are similar to the HRV spectrograms calculated using a contact sensor (the gold-standard comparison) Chaichulee et al.
(2019) [20] HR, BR Two CNN models composed of the skin segmentation network and the intervention detection network  [27] BR Double-moving average filters 60 thermal videos at different resolutions to validate the feasibility of this system for BR Procházka et al.
(2018) [28] BR Extract mean temperature of mouse region A total number of 56 experimental data sets, each 40 min long Hochhausen et al.

Hyperspectral imaging
Chen et al.
(2014) [137] OS Based on the Beer-Lambert Law 21 subjects; The accuracy is 76.19% and 88.1% for the automatic and manual selections of the classifier threshold Depth imaging Gambi et al.
(2017) [138] HR EVM; ROI select-based face detection algorithm; a bandpass filter 20 subjects; the HR estimation differs from the ground-truth by 2% considering the subject's lifestyle; otherwise, 3.4% both the low temporal resolution of this technique and the huge amount of data bring challenges to practical applications.

Depth Imaging
Unlike conventional RGB imaging, depth imaging can acquire depth images with the distance between physical objects in the scene.The Microsoft Kinect sensor is the most common depth sensor because of its relatively low cost and high accuracy, compared with an RGB sensor, infrared sensor, and depth sensor three optical sensors.Until now, there are two versions of this kind of sensor, Kinect v1 and Kinect v2, respectively, based on structured light coding technology and time of flight (ToF) technology to generate a depth image.37][38] In recent years, studies have used the Microsoft Kinect sensor to remotely monitor cardiopulmonary activity due to the reason that it is less expensive than most medical devices. [39]The abdominal-thoracic region is usually taken as the ROI in a Kinect-based physiological monitoring system where the physiological activity is most obvious.Cardiopulmonary activity will cause volumetric changes in the respiratory and heart muscle, providing plenty of useful information about physiological signals.Simultaneously, by tracking the chest and abdomen movements, the physiological signals can be extracted in the depth video.Currently, there are other depth cameras on the market, and a summary of various depth cameras is presented in Table 2. Depth camera has good prospects in physiological measurement applications.

Multimodal Imaging
Although the intrinsic pros and cons of each single imaging technique are significant and varied, they have a common defect, that is, the single imaging mode limits the diversity of information types.Therefore, a combination of optical imaging techniques, referred to as multimodal imaging, can provide more complementary information on physiological processes than single imaging.For instance, now most studies take the RGB camera as the imaging device to measure physiological signals, while the measurement will be difficult during poor lighting conditions such as too dark or too bright light and unequal light.To our knowledge, combining a near-infrared camera with near-infrared illumination enables the physiological signal detection system to work in a dark environment.Hence, we can consider transforming the infrared cutoff filter in front of the camera lens that can be bought on the market, and then combining it with the light intensity detection sensor and near-infrared light lamp, the switch between RGB detection mode and near-infrared detection mode can be realized. [40,41]ultimodal imaging can provide extra information and build multimodal databases composed of multiple modalities to further assist in detecting emotion in facial expression recognition.Multiple modalities, such as facial expressions and physiological responses, show very promising results in understanding complicated human behavior.The first multiple emotion modalities database, including 2D and 3D face visual dynamics, skin temperature dynamics, and physiological responses, was introduced by Zhang et al., [42][43][44] which brought a new perspective into the fields of facial expression and action unit detection.

Data Analysis Methods
There exist three key challenges for the data analysis of the optical imaging-based physiological signal measurement.

Challenge I
In contrast to the contact PPG, the entirety of obtained sensor readings for optical imaging techniques is not essentially or directly useful for the determination of physiological signals.For example, the acquired video inevitably contains a complex background and other redundant spatial information, and the first challenge is to segment the regions which are strongly related to the targeted physiological trait.

Challenge II
In the optical imaging-based physiological signal detection system, motion artifacts will corrupt the obtained waveforms, and subsequently, they pose a more complex problem to determine the targeted physiological trait.In fact, the subjects are always in motion caused by a spontaneous reaction such as smiling and laughing, or by a deliberate action such as moving to another location.Although investigators have attempted to apply the tracking algorithm to mitigate the effects of motion artifacts, they cannot eliminate them.In addition, the use of a tracking algorithm may introduce more errors when the algorithm fails to follow the region of interest.

Challenge III
The third challenge is that the global illumination levels and illumination variations will greatly influence the quality of obtained waveforms: the collected data will be dominated by inherent camera noise when the illumination level is too low or not enough for photosensitive elements; the false peaks and valleys will be introduced when the illumination is uneven, resulting in the invalid measurement.
The quality of the waveform also suffers from noise or other variations in the image sequences, which will significantly distort the signal.Therefore, it is extremely necessary to design excellent algorithms to reconstruct the waveform.Plenty of studies have presented advanced optical imaging-based physiological signal techniques, and the universal data processing flow is summarized in Figure 4.

ROI Detection and Tracking
Based on the cardiovascular system of the human body, the blood is first transported by the pulmonary circulation from the heart to the lungs to absorb oxygen and release carbon dioxide, and then the oxygenated blood is transported by the systemic circulation from the heart to the rest of the body.This process causes each part of the body to have different colors, movements, and temperatures.Therefore, different investigators extract various ROIs to calculate the physiological signals.The commonly used ROIs contain shoulder, forehead, eyes, nose, cheeks, mouth, neck, hand, arm, chest, and abdomen.It is believed that the combined use of these ROIs can improve the robustness of the obtained model.The overview of the proposed framework for robust multi-ROI model establishment is demonstrated in Figure 5.
[47][48][49] In early studies, the most frequent method was proposed by Verkruysse et al., [50] manually selecting the forehead as fixed ROI.Later, Poh et al. [51] presented  a simple and convenient method, using a bounding box of the face to automatically track ROI.There exists a drawback of the aforementioned methods, that is, introducing the nonskin areas (such as the background), which will cause noise when the color of the object is analogous to the skin.Viola et al. [52] proposed an algorithm of facial landmark points for tracking the face, which can solve the existing problem.It is not precise enough as extracting physiological signals due to its simple features.Therefore, Li et al. [53] used more facial landmark points for skin selection to acquire accurate ROI based on the algorithm of Viola and Jones.Feng et al. [54] adopted the K-means clustering method to dynamically search for ROI with good signal quality.
In most physiological signal algorithms, ROI tracking is one of the important steps for ensuring that pixels of the ROI are contained in the area of skin selection when there exist motion artifacts.The simple way for ROI tracking is to detect the moving subject in every frame of the video, but it is not accurate enough and can cause some noise.The method of tracking facial landmark points was proposed by Shi et al. [55] for ROI tracking by using a transformation matrix to estimate the ROI position of adjacent frames, which solves the noise caused by motion artifacts.

Breathing Signal Measurement
The measurement of breathing rate is usually based on the motion information of the subject.The simple approach is to subtract consecutive video frames and subsequently use the sum of pixels in the different images as the breathing signal.Nonetheless, this method only measures the change in consecutive images.To measure the actual motion when the subject inhales and exhales in the video, we can apply the optical flow technique, which is a commonly used technique to conduct movement estimation in video sequences.There are two hypotheses behind the optical technique.The first is brightness constancy, and this hypothesis illustrates that the performance of optical flow algorithms is limited to illumination variations for breathing signal extraction.The second restricted condition is that the position of the selected image block does not change dramatically over time.

Optical Flow Technique
For a selected ROI, the optical flow technique estimates the total optical flow corresponding to motion in the video stream using the following equation.
where Ṽ is the optical flow of every pixel ðp x,y Þ in ROI; ∇E and E t are spatial and temporal derivatives of pixel intensity in p x,y , respectively.
For breathing signal extraction, the curve calculated by optical flow carries the motion information caused by the respiration process.The two most common optical flow algorithms for breathing signal extraction are Lucas-Kanade and Horn-Schunck algorithms. [56]

Principal Component Analysis
Fusing the several spectral channels informative to physiological signals together may achieve a more accurate measurement.In this respect, PCA can obtain the signal that contains the most physiological signal information.The covariance matrix C is computed where B is a matrix of size m, n containing the original image-derived signals; E and Â denote the expectation and outer product operators, respectively; D is the diagonal matrix of eigenvalues; V represents the eigenvector.Using PCA to calculate physiological signals, we need to obtain complete observations.

Peak Detection
During relatively low motion, the BR and HR can be determined from frequency spectra after Fourier transforms by calculating local peaks.In general cases, we can easily apply a "findpeaks" function of the MATLAB toolbox to compute the number of peaks in the processed spectrum.The fundamental of this function has been described in Algorithm 1.

Deep Learning-Based Methods
The fast-growing machine learning (ML) and deep learning (DL) facilitate studies on noncontact respiratory measurement, eliminating feature handcrafting in traditional methods.Hwang et al. [57] automatically realized respiration measurement by ROI detection and pixel selection based on 1D convolutional neural networks (CNN).To the best of our knowledge, the performance of deep learning is related to the amount of training data, which needs to take time to obtain.To overcome this problem, Cho et al. [58] proposed a novel respiratory measurement method called DeepBreath, a deep learning framework combined with a data augmentation technique to learn from a small-scale dataset.Wang et al. [59] invented a gated recurrent unit neural network with bidirectional and attentional mechanisms to Algorithm 1. Description of "findpeaks" function integrated into MATLAB toolbox..
Step 1: create two new sequences viz.t 1 and t 2 whose length is equal to the initial sequence x Step 2: calculate the length n of x and let the counter j ¼ 1
To extract temporal and spatial information of the respiratory signal waveform, Kumar et al. [60] used seven deep learning models mainly including long short-term memory (LSTM) and CNN to better predict the respiratory rate.A deep bidirectional LSTM was applied by Wang et.al [61] to improve the performance of respiration prediction of thoracic-abdominal tumor motion.Optical imaging methods such as the optical flow technique are usually affected by changes in light conditions.To reduce the influence of challenging clinical settings, Chaichulee et al. [20] presented two CNN models composed of the patient detection and skin segmentation network and the intervention detection network.The above methods are all based on visible light cameras to predict respiratory rate.
Compared to visible light cameras, thermal imaging faces the challenge of small spatial resolution, which means lower differences in temporal changes of pixel values.Kwasniewska et al. [62] proposed two deep neural network architectures composed of a recursive convolutional model and transformers.The method focused on texture restoration to improve the accuracy of video-based respiratory rate estimation.

Breathing Pattern
Understanding breathing patterns is clinically important for us to deal with a variety of diseases.There are many types of normal and abnormal breathing patterns: apnea, eupnea, orthopnea, dyspnea, hyperpnea, hyperventilation, hypoventilation, tachypnea, Kussmaul respiration, Cheyne-Stokes respiration, sighing respiration, Biot respiration, apneustic breathing, central neurogenic hyperventilation, and central neurogenic hypoventilation. [63]ere, we introduce some abnormal breathing patterns: 1) Cheyne-Stokes breathing: it is a classic breathing pattern accompanying severe neurological or cardiac disease.The tidal volume of Cheyne-Stokes breathing shows periods of hyperventilation alternating with periods of apnea (Figure 6a).2) Biot breathing: it is also known as ataxic breathing and often occurs with acute neurological disease.Biot breathing patterns can be characterized by groups of quick and shallow inspirations followed by regular or irregular periods of apnea (Figure 6b). 3) Kussmaul breathing: it is very deep and labored breathing caused by diabetic ketoacidosis or renal failure.Kussmaul breathing has long tidal waves characterized by rapid and deeper breathing rates over a prolonged period of time (Figure 6c).4) Apnea breathing: it is a serious breathing disease because no airflow comes into or out of the lungs.Apnea breathing is mainly associated with some conditions, such as airway obstruction, cardiopulmonary arrest, alterations of the respiratory center, and narcotic overdose (Figure 6d).5) Hyperpnea breathing: it is a state in which the person takes deeper breaths than normal breathing.It can occur during exercise activity or because of pulmonary infections (Figure 6e).6) Hyperventilation breathing: it is breathing that exceeds what the body needs and you will exhale too much carbon dioxide.Hyperventilation breathing, also known as overbreathing, may be caused by common things like anxiety, nervousness, and metabolic acidosis (Figure 6f ).7) Hypoventilation breathing: in contrast to hyperventilated breathing, it breathes less gas than the body needs, resulting in an increase in carbon dioxide.There are some reasons for bringing about hypoventilation, like overmedicine, metabolic alkalosis, neurologic depression of respiratory centers, and sedation or somnolence (Figure 6g).8) Tachypnea breathing: it is a condition that refers to rapid and shallow breathing, which can happen in some circumstances, such as fever, pain, emotions, anemia, and respiratory insufficiency (Figure 6h)).

Heart Rate Signal Measurement
The measurement of heart rate is commonly based on the principle that the variation of reflected light due to the skin color change caused by the cardiac cycle can be directly captured by the camera.For HR, Adibuzzaman et al. [64] exerted the following equation HR ¼ Frame rate Â 60=Number of frames (4)   where the accuracy of this equation is highly dependent on the selection of a number of minimum frames between two peaks, and it may be suitable for application scenarios where subjects keep still and illumination is controlled.However, the noise usually can be found in the reflected light since the change of ambient illumination and motion artifacts (such as head motion), and thus the main purpose of heart measurement is to extract heart rate signals from the original reflected signal by frequency domain-based approaches.To date, the methods of extracting heart rate signals can be divided into three categories, called BSS-based methods, model-based methods, and deep learning-based methods.

BSS-Based Methods:
The methods of traditional signal decomposition refer to using the technique of blind source separation (BSS) for the recovery of heart rate signals from the R, G, and B channel signals captured by the camera.The BSS, presented by Belouchrani et al., [65] aims to recover the source signal from the output of the sensor without any prior knowledge with respect to the mixing process.The ICA, the most classical method of BSS, is a technique used for uncovering independent signals hidden in a mixed recorded signal. [51,66]It assumes that the observed signals of the sensor are a linear composition of pulse signals and other source signals.The premise of using ICA is that the underlying source signal is mixed linearly with the interference signal yðtÞ ¼ AxðtÞ (5)   where yðtÞ ¼ ½y 1 ðtÞ, y 2 ðtÞ, : : : , y nÀ1 ðtÞ, y n ðtÞ T and xðtÞ ¼ ½x 1 ðtÞ, x 2 ðtÞ, : : : , x nÀ1 ðtÞ, x n ðtÞ T denote the observed signals and underlying source signals, respectively; A is the square n Â n mixture matrix that contains the mixture coefficients a ij , which represents the transfer function between the observed signals and underlying source signals, and n is a number of spectral channels.The purpose of ICA is to search for a separating matrix W that is an approximate value of the inverse matrix of the original mixture matrix A. The process of BSS has been illustrated in Figure 7.The objective of ICA is to determine a separating matrix W that maximizes the non-Gaussianity of each source for recovering the underlying source signal.In other words, the obtained W has to meet the cost function where W can be solved by the algorithm of joint approximate diagonalization of eigenmatrices. [67]Similar to PCA, ICA requires complete observations to compute physiological signals.
Using FFT to obtain the power spectrum of source signals, the signal with the highest peak of the power spectrum is usually selected as the heart rate signal.
Recently, a novel method called joint blind source separation, proposed by Qi et al., [68] tackles the problem that the conventional BSS can only extract source signals from a single data set.Its innovation is that each source signal can be extracted simultaneously from a multidimensional data set composed of facial subregions, which can accurately estimate the heart rate signal using clustering.Model-Based Methods: It differs from BSS in that model-based methods make no assumption on the relationship between source signals and color signals, which utilizes the physiological properties of the skin reflection model to solve the problem of signal demixing.de Haan and Jeanne [69] proposed a CHROM (chrominance-based) method to eliminate the specular component by normalizing the original video frames in the time dimension.The algorithm of CHROM projects color signals with zero means onto a plane orthogonal to the specular direction, making it independent of specular reflections.Moreover, it assumes a standardized skin-tone vector for removing the effects of the color of illumination based on the prior information, which can automatically correct the white-balancing of images.The orthogonal chrominance signals consist of motion and pulse components, where the variations due to the pulse changes may be different, but the changes caused by the motion are identical.To enhance the pulse components of the chrominance signals, it estimates the heart rate signals by means of "alpha-tuning" where S 1 ðtÞ and S 2 ðtÞ represent the two chrominance signals respectively; σð⋅Þ denotes the standard deviation of the signal.
With the dominance of the pulse components in chrominance signal SðtÞ, the heart rate signals will be strengthened by adding up to S 1 ðtÞ and S 2 ðtÞ.
Deep Learning-Based Methods: Recently, with the growth of data and computing resources, a considerable number of studies are springing up around the theme of deep learning-based noncontact HR methods, which are composed of supervised learning methods and unsupervised learning methods.In many circumstances, the performance of deep learning systems is superior to traditional methods due to their flexibility and expressiveness, which can automatically extract more spatiotemporal features of the input frames and greatly simplify the algorithm. [70]The relevant works are presented in Table 3.
Chen and Mcduff [71] proposed an end-to-end method using a convolutional attention network (CAN) for the physiological measurement from videos, which can simultaneously learn color and motion information in the appearance and motion branches.To guide the motion branch, the attention mechanism was added to the appearance branch, which can use a spatial mask to detect the appropriate ROI and improve the accuracy of the network output.
Later, Liu et al. [72] presented a novel multitask temporal shift convolutional attention network (MTTS-CAN) leveraging temporal shift modules based on CAN to perform efficient temporal modeling, thus realizing real-time cardiopulmonary and respiratory measurements on mobile platforms with little reduction in accuracy.
To address the influence of video compression on heart rate measurement, a two-stage, end-to-end method, proposed by Yu et al., [73] was the first attempt to extract physiological signals from compressed videos.It consists of two parts, namely, the STVEN and rPPGNet, while the STVEN aims to enhance the video and the rPPGNet is designed for the heart rate signals recovery with a skin-based module.The CNN-RNN model is invented by Liu et al. [74] to supervise CNN learning for improving the capability of generalization that leverages the depth map by pixel-wise supervision and rPPG signals by sequence-wise supervision.
To balance efficiency and accuracy, Liu et al. [75] developed two novel one-step neural architectures, namely, EfficientPhys, which can take a transformer or convolutional network as a backbone for physiological measurement without any preprocessing steps.For exploring the influence of the long-range spatiotemporal relationship for rPPG measurement, an end-to-end video transformer architecture called PhyFormer was proposed by Yu et al. [76] to enhance rPPG features via global attention and alleviate the interference with local spatiotemporal representation.
There exist some limitations in the domain of noncontact physiological measurement that the amount of biological datasets is commonly small due to the expensive labeling, which will limit the optimal performance of the network. [22]Thereby more recent attention has been paid to training unlabeled videos.It has shown that some research has achieved remarkable success in data-hungry algorithms.In 2021, Song et al. [21] introduced a novel framework called PulseGAN on the basis of a generative adversarial network.Their method combines some conventional signal processing methods with DL techniques, which take the rough signal derived by traditional signal methods as input and outputs rPPG pulse waveforms through the DL framework.The advantage of this method is that the error loss of the network contains the adversarial loss, time, and spectrum loss, which can effectively improve the quality of pulse waveforms.Gideon and Stent [77] used a self-supervised method based on contrastive learning to improve the generalizability of heart rate estimation without any annotated training data.Compared to other DL techniques, the method mainly focuses on frequency

Frequency Domain Analysis
The frequency domain analysis of HR is traditionally performed by the Fourier transformation.The Fourier transformation is usually used in conjunction with other filters such as a band-pass filter to remove noise.However, the Fourier transformation has a great defect in the processing of nonstationary signals.Even if an improved method such as a short-time Fourier transformation, this defect can not be solved.The wavelet transformation can solve this problem.
For HR measurement, the continuous wavelet transform has been applied to improve the signal quality (denoising or peak and significant point refinement). [78]Similar to the Fourier transform, the continuous wavelet transform conducts inner products to estimate the similarity between an analyzing function and a signal.Unlike the Fourier transform, owing to its variable window width, the continuous wavelet transform is capable of detecting, rapid changes in frequencies in time.The aforementioned advantages allow the continuous wavelet transform to be used for biological signals analysis (such as HR and BR).The continuous wavelet transform of the raw signal can be achieved by convolving the raw signal with a child wavelet ψ τ,s using the Equation ( 8) where ψ is a mother wavelet used for deriving the child wavelet ψ τ,s ; xðtÞ represents the original signal in time domain; s and τ are scaling and translation factors, respectively.The change of these two factors helps us analyze the larger time-domain signal.Currently, there are plenty of standard mother wavelets ready to be chosen to meet the specific application requirements.In the field of physiological signal analysis, the Morlet and Bump wavelets have already been utilized. [78,79]fter transformation, the raw signal can be reconstructed from inverse equations where C Ψ is the admissibility condition and ψ denotes the Fourier transform of Ψ.The inverse transform is usually applied to eliminate noise and trends in the raw signal.One possible step of the continuous wavelet transform for the determination of HR is summarized in Algorithm 2.

Eulerian Video Magnification
EVM is reported to be capable of not only amplifying the color variation but also revealing the low-amplitude motion. [26]Hence, it has been unitized to recover the physiological signal from a recorded observed signal.Figure 8 shows the main procedure of the EVM for estimating physiological signals, particularly for HR.Usually, after recovering the source signal using blind source signal separation methods such as PCA and ICA, an autoregressive model is applied to obtain the final signal Algorithm 2. One possible procedure using the continuous wavelet transform for HR measurement..
Step 1: obtain raw observations x N ðtÞ from ROI (N = number of point of interest) Step 2: use Bayesian minimization to get the clean HR-related signal (incorporate prior knowledge of the expected PPG signal) Step 3: select or create one mother wavelet which is most similar to the expected PPG signal Step 4: determine scale set [s,τ] to encapsulate realistic HR values [f min , f max ] Step 5: do the continuous wavelet transform to yield the time-frequency curve Step 6: confirm the maxima Pðf , tÞ in the time-frequency curve Step 7: compute HR using: HR ¼ ð60=TÞ where Y t and S t are the resulting signal and the original recovered source signal, respectively; φ i denotes the model parameters; c is a constant term; ε t is a white noise term.The obtained autoregressive model is only usable when the coefficient of autocorrelation is beyond 0.5.

Oxygen Saturation
In respect to conventional pulse oximetry, the light transmits through a body segment such as a finger or earlobe, and a part of surviving light is directly related to the inflow of arterial blood into the body segment.The oxygen saturation can be then calculated from the relative amplitudes of the cardiac-synchronous pulsatile component at the two wavelengths.This computing method is based on the hypothesis that the increase in attenuation of light is caused only by the inflow of arterial blood into the body segment.The following equation based on the Beer-Lambert law (also termed as the "ratio of ratios (ROR)" method) [80] is applied to compute the oxygen saturation.
where A and B are empirically determined coefficients, representing additive and multiplicative coefficients, respectively; I AC is the cardiac pulsatile amplitude signal of the transmitted or reflected light at wavelengths λ 1 and λ 2 and I DC is its respective DC component components.λ 1 and λ 2 are two different wavelengths used in pulse oximetry or, in optical imaging techniques, the two different spectral channels in the image-derived OS; the independent variable behind B is the image-derived ROR.In conventional pulse oximetry, two wavelengths λ 1 and λ 2 are 660 nm (red) and 940 nm (near infrared).For the RGB imaging technique, the red and blue color channels are always selected. [81]ome machine learning-based approaches have been proposed to tickle the problem of traditional signal processing methods.Akamatsu et al. [82] proposed MultiPhys and CNN models to simultaneously estimate heart rate and oxygen saturation from facial videos.In summary, there are two difficulties in the research and exploration of oxygen saturation measurement.First, it is necessary to explore a deep learning method that can accurately separate the timing waveform of the 440 nm band and the 940 nm band.Second, due to the absorption of light by oxyhemoglobin and deoxygenated hemoglobin, the changes reflected in the camera screen probably only account for the overall 0.5% of light intensity variation, so future research requires efficient denoising algorithms.

Parameter-Based Models
To the best of our knowledge, early works for the measurement of blood pressure (BP) utilize image-based pulse transit time which refers to the time when a pulse wave reaches from the heart to another part of the body.The pulse wave velocity (PWV), calculated by dividing the PPT by the distance, is directly related to the blood pressure where D is the distance between the heart and another part of the body.The relationship is described based on the approximation of the Moens-Korteweg equation [83] PWV 2 ¼ βP 2ρ (13)   where β, P, and ρ refer to the stiffness parameter, the blood pressure, and the density of the blood, where β and ρ are constant.
There is a positive correlation between PWV and P, which will benefit the estimation of blood pressure.

Feature-Based Methods
In recent years research mainly focus on BP prediction through the spectral or temporal properties of PPG signal based on ML and DL methods.Some researchers utilized dense neural networks to estimate BP from the features of PPG waveform (pulse amplitudes, pulse width and heart rate, and so on). [84,85]Other authors used deep recurrent neural network and LSTM to learn long-term and short-term features of PPG data, which can solve the vanishing gradient problem for the estimation of BP [86][87][88] .Schlesinger et al. [89] proposed a siamese CNN to extract spectrotemporal features from PPG spectrograms for BP estimation.Eom et al. [90] utilized an end-to-end deep learning architecture to predict BP, composed of a convolutional neural network, a bidirectional gated recurrent unit, and an attention mechanism.It uses all combinations of physiological signals (ECG, PPG, and BCG) as input to improve the accuracy of BP estimation.Jeong et al. [91] combined CNN with LSTM to simultaneously predict systolic blood pressure and diastolic blood pressure based on the morphological features of the ECG and PPG signal.Schrumpf et al. [92] used neural network architectures composed of AlexNet, ResNet, and a spectrotemporal network to derive BP from PPG data.

Advantages and Disadvantages of Current Optical Approaches
Due to its advantages of convenience and unobtrusiveness, optical imaging is highly suitable for noncontact physiological measurements.As with other techniques, optical imaging also has some demerits that need to be solved in the future.The choice of any optical imaging approach depends on the purpose of measurement and the ambient conditions required.Table 4 summarizes the characteristics of optical imaging approaches, including their advantages and disadvantages.The major advantages of optical imaging methods can be summarized as follows.
There is an increasing number of studies that chose RGB imaging as a noncontact physiological signal measurement because of its low cost and convenience. [93]People can easily obtain their physiological information without any complicated operations using a smartphone.Moreover, compared to traditional contact PPG techniques, RGB imaging provides the unique advantage of reducing discomfort by electrodes.Similar to RGB imaging, thermal imaging is widely used to monitor health owing to the development of modern infrared imaging technology.The measured performance improves immunity to the subject skin color or the illumination, which provides the potential to monitor sleep during the night. [94]n comparison with other optical imaging approaches, hyperspectral imaging is a more advanced technique that can distinguish identification ability and contain rich information, resulting in high spatial and spectral resolution.The Microsoft Kinect sensor designed for gaming purposes brought a new way for physiological activity detection by capturing depth maps using time of flight technology. [39]Hence, it can find the exact dimensions of the human body in the dark.As aforementioned optical imaging approaches have their own characteristic, recent research concentrates on a new imaging approach namely multimodal imaging which is the combination of several optical imaging methods to acquire more detailed and complementary biological information, which can accomplish high sensitivity and high resolution simultaneously.
Although there are numerous advantages for optical imaging approaches, some disadvantages still needed to be solved before clinical application and they are summarized as follows.
There are some challenges for RGB imaging that it is especially sensitive to some noises such as slight movements and illumination conditions, which will reduce the accuracy of physiological signal measurement.Since thermal imaging estimates physiological signals via collecting the emitted energy of the human body, some of the limitations of thermal imaging are that it is easily interfered with by ambient temperature from other surfaces and lacks facial texture information, making the location of the face relatively difficult.Hence, for thermal imaging, the absence of geometric and textural details will bring some difficulties to the establishment of a physiological signal measurement system.The availability of hyperspectral imaging is limited in practice because of its long-time data acquisition and complicated equipment. [95]Thus the speed of computation needs to be improved to satisfy rapid acquisition and analysis.Also, hyperspectral imaging is highly influenced by motion artifacts.For depth imaging, some obstructions such as cloths and blankets will have a high influence on the measurement.Furthermore, it is highly sensitive to sunlight and therefore is not suitable for outdoor applications.For multimodal imaging, the algorithm for preprocessing data is more complicated.In addition, expensive and complicated sensors set a great barrier for multimodal imaging to the popular application.

Applications of Current Optical Imaging Techniques
It is of great potential for noncontact physiological signal measurement applications due to its comfort, and convenience.The applications including clinical and nonclinical application scenarios are presented in this review.

Neonatal and Elderly Monitoring
The skin of preterm infants is very fragile and sensitive, while the traditional contact physiological measurements that need the attachment of adhesive electrodes to the skin surface may damage the skin and increase the risk of infection in infants.Therefore, there are some researchers trying to use noncontact optical imaging approaches for monitoring neonates.Villarroel et al. [96] developed a multitask convolutional neural network to automatically detect the infants in the neonatal intensive care unit (NICU), segment the skin areas, and compute the vital sign (heart rate and respiratory rate) estimates, which is robust to the lighting variation in the daytime.Sahoo et al. [97] utilized an end-to-end 3DCNN network called PhysNet that can extract temporal contextual information to estimate cardiopulmonary  [98] leveraged photoplethysmography imaging to successfully measure heart rate and heart rate variability of geriatric patients.

Telemedicine
With the COVID-19 pandemic, the desire for remotely monitoring health is growing, especially as it can serve as a means of reducing the risk of healthcare workers being infected during treatment and provide convenience for remote regions. [99]elemedicine can capture physiological signals of the patients using the camera of a mobile phone or laptop as the imaging device, compared to the traditional contact measurements, which can be more easily operated by patients remotely at home and provide reliable physiological data for doctors during the diagnosis. [100]Thus, it will play a critical role in the future health care service.However, there still remain some challenges to the use of telemedicine in clinical environments nowadays.To improve the generalizability of the physiological model, Liu et al. [101] proposed a novel smartphone-based personalized physiological sensing system that can measure physiological signals leveraging both front and rear cameras on smartphones.

Sleep Monitoring
Nowadays, more and more people suffer from different sleep disorders such as excessive daytime sleepiness, difficulty in falling asleep, and so on.Physiological signals such as heart rate are important indicators of sleep quality, considered to be related to human mental and physical health, thus stressing the importance of continuously monitoring both parameters for the diagnosis of sleep disorders.Compared to traditional sleep monitoring, the optical imaging monitoring of sleeping does not require many sensors attached to the person during nocturnal observational studies, causing less discomfort and making it easy to fall asleep naturally.Hu et al. [42] used a dual-mode imaging system that is composed of an RGB-infrared camera and a far-infrared camera, successfully measuring heart rate at night.Vogels et al. [102] monitored the pulse rate and oxygen saturation during sleep by fully automatically detecting the living tissue in the near-infrared spectrum (NIR).

Stress Monitoring
The physiological activity caused by physiological signals, such as the magnitude of respiratory sinus arrhythmia, has been shown to be closely associated with physiological stress, which may cause health problems.Thus it is important to monitor physiological stress in the initial process.Stress monitoring using remote physiological measurement was proposed by McDuff et al. [103] .Later, Wei et al. [104] introduced a novel method named transdermal optical imaging, using a conventional digital camera and ML algorithms to assess basal stress remotely.

Driver Monitoring
It has been shown that a wealth of traffic accidents happen due to driver fatigue and distraction.Therefore, it is desirable for monitoring drivers' health to warn them in a timely manner to avoid car accidents happening.Driver fatigue and distraction can be detected by measuring the heart rate of the driver with the camera. [105]The major challenges in driver monitoring are the variation of illumination and larger motion.Magdalena et al. [106] presented SparsePPG to obtain physiological signals in a high-noise environment.Moreover, most methods have adopted a temporal slice to extract fatigue features, ignoring temporal variations.To fuse the temporal features, Du et al. [107] proposed a novel multimodal fusion recurrent neural network, integrating the heart rate, eye openness level, and mouth openness level three features to monitor driver fatigue.

Exercise Tracking
Exercise tracking is an important way for time-motion analysis of athletes, which can improve individual performance in sports.
Traditional exercise tracking uses some wearable devices that are cumbersome and will make athletes difficult to move freely, while camera-based devices will not.Most noncontact algorithms require subjects to remain still or to have small motions, which is not suitable for fitness exercise circumstances.Recently, Zhu et al. [108] removed ambiguous frequency components that are similar to heart rate by building a motion compensation scheme based on the optical flow, providing convenience to monitoring heart rate in exercise.Wang et al. [109] proposed a new pulse extraction method namely Sub-band rPPG through sub-band decomposition to suppress the different motion frequencies.

Future Prospects
To date, the noncontact physiological measurement using optical imaging can accurately extract physiological parameters, such as respiratory signal and heart rate signal, but there is still room for improvement in this field, which can be widely used in the future and bring convenience to people.

Public Datasets
There are plenty of public datasets in contactless physiological measurement, which will obtain opportunities to explore interesting research questions.Table 5 summarizes the public noncontact physiological measurement datasets.Most of the works are only validated on privately-owned datasets, making it difficult to fairly evaluate and compare the performance of proposed algorithms.What is more, the distribution of existing datasets in race and skin color is usually imbalanced.One of the challenges for deep learning-based methods is that the lack of diverse datasets will cause worse performance.Furthermore, some public datasets only contain a single parameter, such as heart rate or respiratory signal, while conditions like spontaneous behavior, facial expressions, and illumination changes are commonly unknown, which may lead to biases in the measurements.Therefore, building new public datasets, directly known natural conditions of measurement, and setting uniform standards, is an urgent need.A respiratory simulation model, generating simulated respiratory pattern data based on the sine function, was proposed by Wang et al. [59] .Synthetic data present the possibility to generate data without the expensive costs, and it has the advantage of controlling the properties of the dataset, which can help create the less biased models. [110]ater, Wang et al. [111] introduced a scalable physics-based learning model to generate synthetic rPPG videos with diverse attributes such as skin color, and lighting conditions to improve the performance.

Motion Artifact
Early works [112] mostly required subjects to be stationary and extracted physiological signals in a fixed ROI, due to the fact that large motions can overwhelm small physiological changes.Motion artifacts always exist in studies because most scenarios are dynamic.The methods for solving motion artifacts range from traditional signal processing to neural networks, while the signal processing mainly depends on the optical properties of the imager and the neural networks commonly adopt supervised learning.Solving motion artifacts is imminent, which will lead to a wider range of applications for noncontact physiological measurements.In the future, the image processing algorithm and spatial redundancy are two keys to reducing the influence of motion artifacts.As for the image processing algorithm, combining classical signal processing methods with deep learning methods can work better.Website V4V [140] 2021 Camera physiological signal estimation 179 1,358 1040 Â 1392 BP, HR, RR https://vision4vitals.github.io/ VIPL-HR [140] 2018 Camera physiological signal estimation RR, BVP, ECG AFRL [149] 2014 Camera physiological signal estimation 25 300 658 Â 492 PPG, ECG, RR MMSE-HR [44] 2016 Camera physiological signal estimation 40 102 1040 Â 1392 HR, BP A dual-mode sleep video database [40] 2018 Camera physiological signal estimation For spatial redundancy, a single camera has an advantage over multiple imagers due to less spatial redundancy, which can combat the of motion artifacts. [113]3.Ambient Illumination Ambient illumination may be considered constant on the ROI in most applications.Nevertheless, the ambient illumination will vary in most situations.The changes in ambient illumination will have an effect on the amplitude of physiological signals captured by the camera.To our knowledge, the most absorption lighting of hemoglobin is green lighting, [12] and thus it contains the strongest physiological signals.Generally speaking, the amplitude of physiological signals is affected by the intensity of the ambient light source and the distance between the light source and the skin.Consequently, it is important to deal with the influence caused by ambient illumination to better estimate the vital signals.The computational algorithm is the way to reduce the influence of noise from ambient illumination.So far, the algorithm of an adaptive filtering approach [53] or the PPG attention mask [114] is commonly used to reduce the error caused by ambient illumination.Reducing the influence of varying illumination requires further development of a generalized and robust algorithm, which can open up a number of promising directions.

Multiparameter, Multiperson, and Multimodal
Most noncontact physiological measurement systems have in common that they are limited to a single person, who can be measured physiological signals.Therefore, multiparameter measurement of multiperson will be future work for the optical imaging physiological measurement system. [115]The system of multiparameter measurement of multiperson, where vital signals such as heart rate, [116] respiratory rate, [4] blood pressure, [117] blood oxygen saturation, [118] and so on, are beneficial to provision of more accurate physiological parameters in telemedicine and monitoring multi-person health in public scenes.There exists a challenge in the multiperson measurement that most RGB cameras have limited viewing angles, and thus only monitor limited person health.Furthermore, when the subjects step out of the limited view of the RGB camera, the physiological signal measurement will be interrupted.With the rising development of image technology, the panorama camera has been used by more and more people due to its large angle of view, which offers the potential to measure multiperson physiological parameters in public scenes and will increase the accuracy of physiological signals estimation.Combining advances in multimodal imaging systems, the multivariate physiological signals with multiperson can be accurately measured in different environments.

Imaging Parameter
The optical imaging devices vary in parameters, such as sensor types, resolution, frame rate, and so on, which will affect the quality of acquired images.Moreover, the shot distance of ROI from the camera may lead to lower video resolutions, and thus the physiological signals will be easily interfered with by the noise of camera quantization.To our best knowledge, lower resolutions may reduce the accuracy and reliability of the measurement.So it is necessary to use high-resolution data for physiological signals extraction.A novel method combining the blind source separation and model-based methods, namely, semi-BSS was proposed by Song et al., [119] which achieved remarkable performance with super-high resolution at the shooting distance.There are some limitations for higher resolution because of the expensive computation and large storage space, which should be considered in the future.

Video Compression
Most studies take uncompressed videos as the input to achieve a better signal-to-noise ratio of vital signals.Nevertheless, uncompressed videos commonly require plenty of storage to support storing, streaming, and transfer. [120]Thus there will be a trend to extract physiological signals from the compressed videos, which can reduce spatial redundancy within the image.However, it also presents some challenges, for instance, low rPPG signal-to-noise and low spatial resolution in compressed video.To address the negative effects of compression, Yu et al. [73] proposed a two-stage method, while one is used for video enhancement and another is cascaded for the rPPG signals recovery.To further obtain reliable physiological signals measurement, Nowara et al. [121] used an attention-based deep learning approach to train compressed videos, which achieved better performance with a higher SNR.

Privacy Concern
The optical imaging noncontact physiological measurement datasets are commonly composed of face videos that will leak the physiological information of participants, and hence privacy concern has become an important topic in recent years.It is necessary to process datasets to provide personal information about the participants.Some studies tried to remove the color changes caused by rPPG signals in videos for privacy protection. [122]thers proposed to generate a target rPPG to conceal the real rPPG of participants. [123]But it was only tested on simple datasets, which may be not suitable for realistic situations such as sudden facial expressions.To address the aforementioned problems, Sun et al. [124] presented a novel method called Privacy-Phys on the basis of a pretrained 3D convolutional neural network, to modify rPPG signal in facial videos, in which the modified video is visibly similar to the original video.Future works should pay more attention to privacy-preserving techniques for personal information protection.

Conclusion
The improvements in optical imaging technology and the popularity of artificial intelligence and deep learning have provided the possibilities for reliable, convenient, rapid measurement of the physiological signal, which successfully deals with various health and medical problems.Comprehensively, five optical imagingbased techniques were given overall and integrated consideration in this review.A general architecture for measuring physiological signals under any conditions was illustrated.The various applications in this review show the capability of using optical imaging methods for neonatal and elderly telemedicine and sleep monitoring, and driver and stress monitoring, indicating its great potential for noncontact measurement in clinical and nonclinical environments.However, there are still many challenges facing the full exploitation of optical imaging approaches in terms of motion artifacts, ambient illumination, public datasets, multiple parameters and subjects, imaging parameter, video compression, and privacy protection.Therefore, future studies should mainly pay more attention to the effect of numerous factors: 1) protecting the privacy of participants while reducing the latency time on-device; 2) deploying more efficient DL approaches to eliminate the dependency on traditional methods and manual feature extraction; 3) generating more synthetic physiological measurement datasets through DL approaches; 4) overcoming the difficulty of physiological signal extraction caused by motion artifacts during walking; 5) using the panorama camera to monitor multiperson physiological signals in public scenes; 6) reducing the impact of imaging parameters on physiological signal extraction.It is anticipated that the noncontact physiological signal measurement system with the advance of optical imaging techniques and DL can realize real-time and long-time health monitoring.

Figure 2 .
Figure 2. Simplified hierarchical model of upper human tissue layers.

Figure 5 .
Figure 5. Overview of the proposed framework for establishing robust multi-ROI models of physiological signals.

Figure 4 .
Figure 4. Block diagram of the universal data processing flow for optical imaging-based physiological signal measurement.

Figure 7 .
Figure 7. Brief overview of the BSS method.The color values of ROI are averaged in RGB channels and the three estimated sources are calculated with a separating matrix W.

Figure 8 .
Figure 8. Block diagram of EVM for physiological signal measurement.

Table 1 .
Summary of optical sensor systems for physiological signal measurement.

Table 2 .
Parameters comparison of different depth cameras.
[77]2021 Contrastive learning Introducing novel loss functions for training to estimate PPG signals augmentation instead of spatial distortion and chromatic distortion to find the signal of interest and uses a saliency-based sampling module to provide interpretable output.

Table 4 .
Summary of principle, advantages, and disadvantages of commonly used optical sensors for the unwired and unobtrusive physiological signal measurement.in NICU setup, supervised by surrogate ground truth labels generated using CHROM.The technique is also suitable for health monitoring of older patients, especially those with dementia or insanity who do not receive routine contact-based monitoring.Recently, Yu et al. signals

Table 5 .
Summary of public noncontact physiological signal measurement datasets.