Pain expression assessment based on a locality and identity aware network

In clinical medicine, the pain feeling is a signiﬁcant indicator for the medical condition of patients. Of late, automatic pain assessment methods have received more and more interests. Many researchers proposed corresponding methods and achieved impressive results. However, they always ignore the locality and individual differences of painful expression. Therefore, a locality and identity aware network (LIAN) for pain assessment is presented here. Concretely, for the locality characteristic, a locality aware module consisting of a two-branch structure, feature and attention branches, is presented. The former learns pain features by a deep network, while the latter guides the pain features to focus on the dis-criminational regions of pain. As for the individual differences, an identity aware module with a multi-task method is proposed to represent identity-related information to achieve identity-invariant pain assessment. Extensive experiments on public databases show the superiority of LIAN in pain assessment.


INTRODUCTION
Pain is the most direct manifestation of the patient's medical condition in health care. If the medical staff are unable to take corresponding measures when the patient feels severe pain, it most likely causes a medical accident. Currently, the pain intensity assessment mainly employs physical ways, that is, self-representation or observer assessment. Self-representation requires patients to dictate their pain feelings at a specific time. This method is limited by the individuals who cannot report their pain experience. There are some vulnerable people, who are often under-or overtreated, including babies, infants, and children; adults with cognitive impairment (e.g., brain paralysis); folks with intellectual disability; and persons who are in ICU [1]. The observer assessment method is always inefficient and has a low reliability. The reason is that continuous and uninterrupted escort is tormenting for the observer. And the judgement of observer is influenced by personal factors, such as the relationship to the patients or the attractiveness of sufferers [2]. Therefore, automatic pain assessment systems are necessary. They could supplement the current pain assessment methods to achieve better pain management. Unlike physical methods, they can continuously monitor the pain state of patients. This This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology may improve the clinical treatment effect, such as the timely intervention of medical staff to a person who cannot call for a help. In addition, the automatic systems are more objective, as they are not affected by human factors. The implementation of automatic systems could also help researchers to recognize pain more clearly, like the changes of facial expressions, since they are more sensitive than common expressions [3].
However, pain assessment is still challenging, as the previous methods mostly taken some general models, instead of involving a detailed exploration of pain mechanism. In fact, pain shows locality in facial expression [20], that means, using human whole face as input in these previous studies introduces many interferences. This characteristic requires the proposed method could focus on the most pain-related regions on face. Fortunately, attention mechanisms have recently been explored in a wide variety of contexts [21], providing a local focus capability for neural network models. This inspires us to employ an attention module for the locality of pain. Besides, pain is a subjective experience [22]. Because individuals produce different facial expressions depending on personal factors. Automatic pain assessment methods shall take into account not only the changes in facial pain expressions, but also subject variations. Thus, examining the diversity of identities is another key to the pain issue.
For these reasons, here, we present a locality and iIdentity aware network (LIAN) for pain assessment. In specific, LIAN has two aspects for pain: locality aware and identity aware. For the former, LIAN employs a dual-branch framework that consists of attention and representation branches. In the attention branch, an attention map for pain-related regions is obtained. And under the guidance of attention map, the representation branch extracts deep pain features. As for identity aware, we employ a multi-task learning method that accomplishes pain assessment and identity recognition at the same time. The identity-invariant pain assessment is implemented by this method.
The main technical contributions of this work include: • We propose a network called LIAN consisting of locality aware module and identify aware module for pain assessment. • The locality aware module is made up of a dual-branch framework. In this framework, an attention branch is leveraged to obtain an attention map of pain on face. Then, the learnt attention map guides the acquisition of pain representation. The obtained pain representation fully considers the most relevant regions of pain and excludes irrelevant interferences. • In identify aware module, a multi-task learning method is leveraged to assess pain and recognize identity simultaneously. This decoupled task approach allows the network to pay more attention to information that is not related to identity in the task of pain assessment. • Extensive experiments on public databases of pain assessment are performed. The results show that the proposed LIAN pain assessment method achieves the compelling performance.
The remainder of this paper is organized as follows. Section 2 briefly reviews earlier related researches. In Section 3, we detail the presented architectures and theories, and the results of comparative experiments are introduced in Section 4. Finally, Section 5 concludes this work.

RELATED WORK
In the current paper, our research problem is pain assessment, and the significant technique used is the attention mechanism.
In this section, we review the previous related works mainly involving these two aspects, that is, the similar issues (pain assessment) and related techniques (attention mechanism).

Approaches towards pain assessment
At the beginning, due to the stability and reliability of traditional hand-crafted features, many related researches began to leverage these features. For instance, as stated in [7], active appearance models (AAM) were used to extract shape and appearance features that were classified by an SVM classifier. In [8], Lucey [23] extracted three types of hand-crafted features: shape features, LBP, DCT. Then they used RVR to regress these features for pain intensity. In the work of [24], the log-normal filters were employed to extract features that were used to estimate pain. Sikka et al. [25] used weakly supervised learning theory to locate the pain frames in a video. Florea et al. [4] tried HoT method to describe the pain texture. Besides, Werner et al. [10] began to explore physiological signals in the database, like galvanic skin response, electromyography and electrocardiogram. In [26], appearance features were used to detect pain. Hammal et al. [27] employed the combination of facial features, sensor signals, and environmental background as dynamic pain features. Similarly, Kächele et al. [28] also considered multi-modal features that consisted of LBP and bio-physiological features. It has been approached in [29], where the authors were interested in the identification of genuine and faked pain expressions of social communications. In their method, they released a computer expression recognition toolbox to extract features. According to [30], Irani et al. took a steerable and separable filter to describe the energy released by pain expression. As stated in [5], Zhao et al. presented an efficient optimization algorithm based on alternating direction method of multipliers for training classifiers. Werner et al. [31] explored the relative positional movements of face during the pain expression generation. Based on this observation, facial activity descriptors were presented for pain. In [13], Liu et al. leveraged some hand-crafted features and multi-task learning to personalize the pain of each subject.
On the other hand, with the increase of pain data, some researchers began to introduce the deep learning techniques in pain assessment. For instance, Liu et al. [32] used 3D CNN with the constraint of deformable action component, which could detect specific facial action parts. In [16], Zhou et al. employed a RCNN architecture for pain problem. In the work of Rodriguez et al. [17], they approached pain task by using the combination of CNNs and long short-term memory (LSTM). Specifically, in their experiments, a pre-trained face model [33] is finetuned to the pain database, then LSTM classifies the output of the finetuning network. Subsequently, Bellantonio et al. [34] took a super-resolution method before CNNs, which improved the performance in strong pain recognition. Similar to [13], Martinez et al. [35] only employed LSTM to estimate pain. Notably, in their work, the hidden conditional random field was used to output the visual analog scales result. In [14], Tavakolian et al. extracted features from segments with a pre-trained model. Then, these features were aggregated to exploit statistics information. Wang et al. [19] employed an SVR to estimate pain from three kinds of features: 3D CNN features, histogram of oriented gradients features and geometric features. And, in [36], the authors used two RNNs to exploit the dynamic information from the VGGface features. In the recent research, Sowmya et al. [37] employed a sliding window strategy to obtain a fixedlength input sample for the LSTM network. And they concatenate the network output of each segment to generate a videolevel output for pain.
From the above researches on pain assessment, we can conclude that in recent years, pain assessment has gradually moved from traditional hand-crafted methods to deep learning approaches. The latter also achieve better results. However, most of the mentioned methods focus on how to extract features that represent pain from face. As for the individual differences in pain, there are only several works are involved. Pain is a subjective individual emotional feeling, and it is greatly affected by subject changes. Therefore, the pain assessment involving identify aware is also a significant issue of pain researches.

Approaches based on attention mechanism
So far, publications about pain assessment rarely involve the attention model. In order to review the attention mechanism, we cover several other applications in this part, including image caption, machine translation, and feature map attention. And there are similar tasks to ours, facial analysis tasks, which are also reviewed. Among them, in image caption and machine translation, the proposed new models are based on the structure of encoder and decoder. Since there is no decoder in feature map attention and facial analysis, the attention weights are learned through the feature map.
In the application of image caption, most of the researches take CNNs as an encoder to extract image features, and RNN as a decoder to output corresponding words or sentences. For instance, in [38], the structure of CNNs and LSTM was adopted for the image description problem, and the attention mechanism is employed in the process of generating the context vector by the encoder of CNNs. In [39], they believed that in the problem of image description, the attention not only should be applied to the spatial of the feature map, but also on the feature map channels. Therefore, in the process of getting features, they also added the corresponding attention weight on the channel.
Also, both the encoder and the decoder use RNN structure in machine translation application. In the work of [40], they introduced the attention mechanism into the field. They proposed to use a bidirectional RNN in the encoder. In the process of constructing the context vector, use the hidden states of the decoder to perform an attention query on the hidden states of the bidirectional RNN to obtain a context vector with attention weights. Like the 'soft' and the 'hard' model of [38], Luong et al. [41] proposed global and local attention.
Besides, in the field of feature map attention, many researchers modified the structure of the network to make them more focused on the target. Like the famous spatial transformer networks (STN) [42], it allowed the network to automatically learn the corresponding conversion parameters and convert them into feature maps that only contain targets. And it used bilinear interpolation during the conversion of the feature map, which made it a differentiable module. So this module could be added to any end-to-end network structure. Wang et al. [43] introduced the attention mechanism into the residual network. In their work, they first learned the attention weights through the feature map with the called bottom-up top-down structure. Then, they added the original feature map to the results of the attention network learning to improve the performance of the deeper network, which was taking advantage of the idea of residuals.
As for facial analysis, there were also many works focusing on the development of the attention mechanism. Most of them employed attention to the features of CNNs and RNN. In [21], they proposed the CNN network with attention mechanism for facial expression recognition which can focus on the most discriminative areas of face. In order to solve the problem of age estimation from facial expressions, Pei et al. [44] proposed an end-to-end attention model. Specifically, they learned a feature map with weights on the CNN feature map. In the LSTM process, an attention weight was also learned for the output of each time step, and then the output feature vector was weighted averaged to obtain the final features which were used to make a regression. In [45], a simple network was used for the experiment of expression recognition. In particular, the author employed the STN [42]. In this way, the network could focus on the expression-related areas to improve the results of the experiment.
As can been seen, more and more computer vision works tend to explore the application of the attention mechanism, especially in facial analysis. They employ attention module that enable the networks to focus on the most significant regions in face. This is exactly the network attribute needed for the locality of our pain assessment task. Unfortunately, as far as we know, no one has yet explored how to apply the attention mechanism in pain. This motivate us to propose a deep network with locality aware.
In summary, the two characteristics of pain estimation, locality and identity, are the keys to this issue. But there are few corresponding solutions for these two problems in current literatures. Here, we present to employ the attention mechanism for the The proposed framework consists of three parts: inputs, locality aware module and identify aware module. There are some face images input into our framework. These faces are fed to a dual-branch locality aware module to emphasize the pain-related region information. And the identity aware module (IAM) is used to decouple the pain assessment tasks of different individuals locality aware and multi-task learning to decouple pain identity problem. With this method, our proposed network could focus on important regions of pain and extract individual-invariant pain features.

LOCALITY AND IDENTITY AWARE NETWORK FOR PAIN ASSESSMENT
Here, we present our LIAN that assesses pain intensity from face images. To explain the details of our proposed method, total four parts is successively introduced, including the framework, locality aware module, IAM and parameter learning.

The framework
Considering the locality and identity of pain, here, we propose a LIAN for pain assessment, as shown in Figure 1. The proposed LIAN consists of three parts: inputs, locality aware module and identify aware module. There are some cropped face images input into our whole framework. Specifically, these images are fed to different branches. The attention branch could segment the attention map that shows the important pain-related regions. And the feature branch extracts deep semantic pain features. The attention map could guide the deep features to focus on the important information of pain. Then, the IAM employs a multi-task learning method to decouple the identity recognition and pain assessment, which could achieve identity-invariant pain assessment. In the following, we detail each part of the proposed LIAN.

Locality aware module
When the patient feels pain, the facial pain expression will cause various deformations of organs. However, due to the partic-

FIGURE 2
The face pain expression always shows locality characteristic. If some pain-related regions are occlusion, there is no any judgement about pain. When these areas are displayed, the pain is easy to perceive ularity of human face, it always contains a lot of information irrelevant to pain perception. According to the works of [20], pain shows locality in facial expression. As shown in Figure 2, if we cover some regions, we cannot do any judgement about pain, but when these areas are displayed, the painful expression is obvious. This strongly illustrates the locality of face pain expression. Therefore, a deep network which focuses the locality of pain shall greatly improve the pain assessment performance.
Here, we expect the deep neural networks to pay more attention to pain-related regions via a well-designed network structure. The proposed structure consists of two branches: attention branch and feature branch. The attention branch is used to segment the important regions of pain assessment. Deep neural networks are leveraged to extract deep pain features in feature branch. And the attention branch could guide the feature branch to focus on the pain-related information. This dual-branch structure is called locality aware module.

Attention branch
We employ an encoder-decoder style network to accomplish the attention branch. The reason is that our purpose is to

FIGURE 3
The ground-truth of attention map segment pain-related areas from the face, and the network with this style has the most effective ability in the field of image segmentation [46]. Therefore, we adopt such structure to directly segment pain-related regions from the face for the locality characteristic of pain assessment. It is worth noting that we not learn the attention map from the feature map like [44,45]. This is because the edge of the high-level feature map in deep network tends to be blurred, while the low-level feature map cannot contain pain, since the receptive field is too small. Specifically, we employ a variation of the fully convolutional model proposed in [46] for attention map segmentation. And we use four layers in the coder with skip connections and dilation of 2x. We also leverage the pre-trained ResNet34 to initialize the decoder layers. This operation could improve the convergence speed significantly. The detail structure of this branch is shown in Table 1.
When the network trains, this branch has its own groundtruth. And this ground-truth is obtained by the provided AAM landmarks, as shown in Figure 3. These landmarks are sequential, and they outline the pain-related regions. Therefore, these points could be used to make the label of attention segmen-

Feature branch
Here, the deep feature map could be obtained by feature branch. We employ four ResBlocks to extract high-dimensional features from face images. These features could represent the spatial information. Also, they match the attention map for the locality of pain. Notably, no pooling or stride convolutional layers are implemented, which ensures the output size of feature branch is same as that of attention branch. The detail structure of this branch is shown in Figure 4. When attention map and feature map are obtained, attention map could guide the feature map to focus the significant information. We denote attention map as M att , and feature map as M f . The result of the attention map guidance is show as follows: where, ⊗ represents the multiplication of corresponding position in feature map and attention map, and I att is the result of attention map working. Thus, the feature map with important information attention is obtained. We could employ this feature map to perform downstream tasks.

Identity aware module
After we get the feature map with locality, we set up four convolutional layers, pooling layers and some non-linear layers to generate high-level semantic features from the feature map. This high-level features are used to perform subsequent tasks. The network structure of feature extraction part is shown in Table 2.

FIGURE 5 Different individuals have various manifestations of pain
The focus of this section is the decoupling of tasks. It is well known that different individuals have various manifestations of pain. Some people show their frown when they are in pain, while some perform their levator contraction, as shown in Figure 5. These signs all show pain has high inter-subject variations. To alleviate these variations introduced by different individuals and achieve better pain assessment performance, we could decouple the task of personal identification from pain assessment. This identity aware attribute ensures our network to accomplish the identity-invariant pain assessment.
Therefore, here, we explore the different pain expressions of various individuals. We implement a multi-task learning for pain. Multi-task learning is a machine learning method based on shared representation. Multiple related tasks are implemented together. In the learning process (training), multi-task learning utilizes a low-level shared representation to capture the taskrelated information for each task. This method could make each task achieve better results, and enhance the ability of generalization. In our case, it is expected to use multi-task learning to decouple pain and identity tasks. In this way, the network could focus on pain-related information in pain task rather than individual changes.
As described in the overview of this part, two related tasks are considered here: pain assessment (Task I) and identity recognition (Task II). Correspondingly, two loss functions are employed. Since these tasks belong to a classification problem, we employ two softmax classification loss functions. The forms of these functions is shown in Section 3.4. Thus, the IAM with decouple tasks could make the network pay attention to painrelated information in pain assessment task.

Parameter learning
This section shows how the parameters of the network update. Suppose a training database D = {x i , mask i , s i , y i } i=1,…,N contains N image frames. x i and mask i represent facial images and attention map ground-truth. s i and y i denote the subject ID and pain intensity ground-truth, respectively. First, we train the attention branch with x i and mask i for the pain-related regions attention. Since this task belongs to image segmentation, we employ the pixel-wise mean square error (MSE) loss function for this attention branch, as shown in Equation (2).
where, w, l mean the width and length of attention map, and a jk ∈ mask denotes the pixel intensity in ( j , k) of attention map ground-truth (for ease of understanding, we only list the loss calculation method of one sample). For the pain assessment, we employ a softmax loss function for this classification issue, as follows: where C pa is the pain intensity classification numbers. z p represents the output of pain assessment task on the p th class, and y p is the ground-truth of the p th class. Similarly, for the identity recognition, the loss function is shown as follows: where C id is the identity recognition classification numbers. Thus, the final loss is expressed in Equation (5).
The is used to balance these loss functions.
In the training phase, the L MSE is employed to train attention branch firstly. Then, this trained branch is inserted into the whole network for training with the L final . As for the training strategy, we utilised the method of stochastic gradient descent (SGD) with the momentum 0.9 and weight decay 0.0005. The initial learning rate is set to 0.001, and the learning rate per 5 epochs is reduced to 0.9 times from the 15th epoch. Both training procedures use a batchsize with 32 and dropout with p = 0.5 in the fully connected layers. The data are augmented by resizing the image to size 125 × 125 and randomly cropping it to size 112 × 112.

EXPERIMENTS AND RESULTS
Here, we first introduce the used database and experimental settings, then extensive experiments are performed to evaluate the effectiveness of each module. We also compare our proposed method with the existing methods in the last part.

Database and experimental settings
Here, we first evaluate the proposed method on the UNBC-McMaster Shoulder Pain Expression Archive Database [47]. This database consists of 200 video sequences with 48,398 image frames from 25 subjects, who underwent a series of range-of-motion tests with their affected and unaffected limbs.
Corresponding AAM landmarks and the Prkachin and Solomon Pain Intensity (PSPI) annotations are provided for every frame. This database also includes annotations with self-report and observer measures. It is widely used in the research of pain assessment. For the annotations, according to [5], we integrate the level of pain into four levels corresponding to PSPI 0, 1 to 2, 3 to 5, and other intensities. Also, the BioVid Heat Pain Database [48] is employed to show the generalization of the method. This database is collected from 90 volunteers from three age groups. Four discrete pain intensities are stimulated in the right arm of each subject. In addition to collecting video information, this database also collects a series of physiological signals. The middle frame of the video in this database is selected as the pain image data, and the annotation of the video is used as the ground-truth. In order to show the superiority of the proposed method, we implement the comparison with existing methods on this database.
Besides, following others experimental settings [5,16,17,23], that is, leave-one-subject-out cross validation method, we separate the database into two parts. The first part, consisting of most subjects, is used as a training set to train our proposed model. And the second part, consisting of one subject, is used to test the model. There are a certain number of cycles in the experiment in order for each subject to test once.
For the comparison of experimental results, we test our results under the metrics of accuracy, Pearson correlation coefficient (PCC), intraclass correlation coefficient (ICC), mean square error (MSE), and mean absolute error (MAE). The accuracy is the proportion of the correct prediction. The definitions of others are shown as follows, where, y andŷ stand for labels and predictions, respectively, and avg denotes the average of ground-truths and predictions. The range of PCC and ICC is [−1, 1]. The −1 means the prediction is completely irrelevant to ground-truth. Conversely, 1 means that they are completely related. These two indicators are larger which represents the network forecasts better. The range of MSE and MAE is [0, +∞). They measure the distance between predictions and ground-truths. When these two indicators are smaller, the network obtains the better results.

The evaluation of locality aware module
Here, the proposed locality aware module is tested. Two parts of experiment are shown, namely the evaluation of attention branch and feature branch.

Attention branch
We propose an attention branch with an encoder-decoder structure to make our network pay more attention to the important regions of facial pain assessment. Here, we experiment with two scenarios, the network with the attention branch removed (single branch) and the proposed network structure (dual branch). The results are shown in Figure 6. As can be seen, the dual-branch structure shows better performance. The reason is that the attention branch could segment the desired regions. Here, pain shows locality on the face, The test of different depths which means this characteristic requires the network to focus these pain-related regions. When the deep network could pay attention to important information without being disturbed by irrelevant information, it will show good results in the task of pain assessment. In order to verify our ideas, we also visualize the results obtained by the attention branch. These attention maps are shown in Figure 7.
Observed from these results, the attention branch based on supervised learning indeed more emphasizes the areas related to facial pain. Also, they affect the subsequent extraction of semantic pain features by superimposing with the feature map, making the task of pain estimation easier. Therefore, we can conclude that this attention branch allows our network to segment painrelated regions, which enables the deep network to pay more attention to the information most related to pain assessment.

Feature branch
Apart from attention branch, we also devise a feature branch to extract deep pain feature under the guidance of learned attention map. Since we employ the ResBlocks to construct these layers, in this part, we test the influence of these blocks. First, we experiment various depths of feature branches, and the results are shown in Figure 8. From this figure, we know four blocks is most suitable for our task. The reason may be that this layer number is the same as the depth of attention branch. This matching structure can often achieve better results. Besides, we test the situation without ResBlocks. We implement this comparison by substituting ResBlocks with ordinary convolutional layers. The results are illustrated in Figure 9.
As can be seen, the effect of ResBlocks is obvious. These blocks could effectively reduce the possibility of network overfitting. And the appropriate depth of network could achieve better performance.

The evaluation of identity aware module
Here, we also propose an IAM with a multi-task learning method. We employ this module to decouple the task of identity recognition from pain assessment. With this method, the framework could take more concentration on the issue of pain assessment. Here, we first compare the performance of framework without the IAM and with this module. The results are shown in Figure 10.
Observed from these results, we know that the framework with IAM could achieve better performance. This suggests that our proposed IAM allows the framework to assess identityinvariant pain by separate these entangled tasks. In this way, our framework could focus on pain assessment task.

Comparison with existing methods
Here, we compare our method against other existing methods under the same metrics. First, we consider our method is based on deep networks. Thus, we begin these comparison by contrast our method with some traditional hand-crafted features. The comparison result of UNBC database is shown in Table 3.
In this table, '-' means there is no report result in that publication, and '⇑' denotes the method obtains better performance  when this metric shows larger, vice versa. Also, we compare with several methods on BioVid database in Table 4. Since these methods employ the accuracy metric, we test our method under same condition. As can be seen, our deep learning method is far superior to these traditional features. The reason is that the deep learning method could extract targeted features according to the tasks, rather than using general hand-crafted features to represent pain. Besides, the comparison with deep learning methods on two databases is illustrated in Tables 5 and 6. As can be seen from Tables 3-6, as for pain intensity assessment, the conventional methods with hand-crafted features plus classifiers [4,5,23,31,[49][50][51]57] in the early years gradually converted into deep learning methods [14-17, 19, 55, 56] recently. It is obvious that with  the development of deep learning, the pain assessment method shows better performance. And we obtain the best results on UNBC and the better enough results on BioVid. This suggests that current deep networks show enough advantages in various tasks, but the common deep networks cannot focus on the important information for the task of pain assessment. That means these general networks are not like human who have an attention ability when they observe visual signals. Besides, they are always unable to perceive the variability of pain between individuals. Our locality and identity aware method not only focuses on the important information but enables the network to pay attention the task of pain assessment. There are also several works using dual-branch frameworks. The results of the comparison with our method are shown in Tables 7 and 8. Some researches leveraged a set of handcrafted features, for example, [10,28,49,50,61,62]. And some researchers combine traditional and deep features for pain assessment [19, 58, 59 63]. And we obtain the best results on UNBC and the good enough results on BioVid. The reason  is that these works try to provide some solutions from various views. But, there is no depth comprehension for the analysis of pain. They always look down on the locality and identity of pain which are significant. Therefore, compared with the proposed method here, these methods show some disadvantages in terms of pain assessment performance.

CONCLUSION
Pain assessment is a significant part of the health care process in modern medical institutions. Automatic assessment methods become more and more important. Here, we presented a locality and identity aware network called LIAN for this issue. The proposed network could addresses the locality aware and identity aware of pain. To be specific, a dual-branch framework called locality aware module is first used to obtain an attention map of painful face. Then, the learnt attention map is used to guide the deep feature extraction of pain. The obtained pain representation fully considers the most relevant regions of pain and excludes irrelevant interferences. Besides, a multi-task learning method called IAM is leveraged to assess pain and recognize identity simultaneously. This decoupled task approach allows the network to pay more attention to the information that is not related to identity, so as to achieve an identity-invariant pain assessment. Finally, to evaluate the proposed method, extensive experiments on public databases of pain assessment are implemented. The results show that the presented LIAN pain assessment method achieves the compelling performance.