Two-way constraint network for RGB-Infrared person re-identiﬁcation

RGB-Infrared person re-identiﬁcation (RGB-IR Re-ID) is a task aiming to retrieve and match person images between RGB images and IR images. Since most surveillance cameras capture RGB images during the day and IR images at night, RGB-IR Re-ID is helpful when checking day and night surveillance for criminal investigations. Previous related work often only extracts sharable and identity-related features in images for identiﬁcation. Few researches speciﬁcally extract and make use of features that do not have the ability to distinguish identity, e.g. identity-unrelated features derived from background and modality. In this Letter, we propose a novel and concise RGB-IR Re-ID network named two-way constraint network (TWCN). Com- pared with traditional Re-ID networks, TWCN not only extracts and utilises identity-related features but also novelly makes full use of identity-unrelated features to improve the accuracy of the experiment. TWCN uses a reverse-triplet loss to extract identity-unrelated features, and proposes an orthogonal constraint to remove identity-unrelated information from identity-related features, which improves the purity of identity-related features. In addition, a correlation coefﬁcient synergy and central clustering (CCSCC) loss is introduced into TWCN to extract identity-related features effectively. Extensive experiments have been conducted to prove our method is effective.

Introduction: RGB-IR Re-ID is a challenging multi-modality problem first proposed by Wu et al. [1]. In the past work [1][2][3][4][5][6][7][8], the main concern is how to extract identity-related features from images, and use identityrelated features to perform similarity ranking matching for identification. However, in the RGB-IR Re-ID task, we think that not only should we extract identity-related features, but also consider the elimination of identity-unrelated information from the identity-related features. The reasons for this consideration are as follows. Compared with traditional Re-ID tasks, due to the large modality differences in the RGB-IR Re-ID task, colour and texture are difficult to use as clues to distinguish identities, which also means that the identity-related information in images has been greatly reduced. It is foreseeable that the reduction of identity-related information makes it easier for identity-unrelated information to be misjudged and appear in identity-related features. From this perspective, we try to remove the above-mentioned identity-unrelated information from identity-related features to improve the accuracy of RGB-IR Re-ID.
We make a strong assumption that in the feature extraction process, identity-related features are inevitably coupled with part of identityunrelated information. In addition, the identity-unrelated features contain most of the identity-unrelated information. Through an orthogonal constraint between identity-related features and identity-unrelated features, the identity-unrelated information in the identity-related features can be removed. It improves the purity of the identity-related features. Besides, to solve the challenge of extracting sharable and identity-related features, a correlation coefficient synergy and central clustering (CC-SCC) loss is introduced into TWCN.
It is worth mentioning that in 2020, Kansal et al. [9] proposed a spectrum-disentangled representation learning method in the RGB-IR Re-ID task, which circuitously disentangled the spectrum information in identity-related features, making the identity-related features insensitive to image modality differences. However, this method only decouples the spectrum information, which still retains other noise items (e.g. background) in identity-related features. Besides, the multi-stage train- ing strategy increases the amount of calculation and training time. Compared with this method, our network is more concise, and can directly decouple global identity-unrelated information with an end-to-end training strategy.
In this Letters, our contributions can be listed as follows: (1) We propose a novel and concise RGB-IR Re-ID network named two-way constraint network (TWCN), which not only extracts and utilises identity-related features but also novelly makes full use of identity-unrelated features to improve the accuracy of the experiment. (2) TWCN uses a reverse-triplet loss to extract identity-unrelated features and proposes an orthogonal constraint to remove identity-unrelated information from identity-related features, which improves the purity of identity-related features. (3) A CCSCC loss is introduced into TWCN to extract identity-related features effectively.
Baseline and Two-Way Constraint Network: Our baseline is a traditional Re-ID network proposed by Luo et al. [10]. Because Ye's work [11] indicates that generalised-mean pooling (GeM pooling) [12] outperforms average pooling in the RGB-IR Re-ID task, we replace average pooling with GeM pooling in the baseline. As shown in Figure 1(a), the baseline is composed of ResNet50 backbone, GeM pooling, a batch normalisation layer, and a fully connected layer. The expression of GeM pooling [12] is as follows: where p c is a hyperparameter that will be learned in the training. With a learnable hyperparameter p c , GeM pooling dynamically adjusts the pooling state, which retains more effective information in the deep layer of the network. More details about GeM pooling can be found in [12]. In the baseline, as shown in Figure 1(a), the initial convolution block of ResNet50 is taken out separately to perform different convolution operations on the two modality images to obtain modality-specific features. Then, the modality-specific features are concatenated and the rest of the network extract features from the concatenated modalityspecific features. The losses for the baseline network are triplet loss and ID loss. It is a powerful and traditional baseline, but only extracts identity-related features.
The overall network architecture of the TWCN is illustrated in Figure 1(b). On the basis of the baseline, we add two fully connected layers to separately collect identity-related and identity-unrelated features. When extracting identity-related features, a CCSCC loss is introduced based on the original triplet loss and ID loss. A reverse-triplet loss is used in the extraction of identity-unrelated features. There is an orthogonal constraint proposed between the identity-related features and identityunrelated features, which is used to remove identity-unrelated information from identity-related features.

Identity-Unrelated Information Removal:
In this Letters, we make a strong assumption that in the feature extraction process, identity-related features are inevitably coupled with identity-unrelated information. And the identity-unrelated information can be removed from the identityrelated features by exploiting identity-unrelated features.
Usually, we directly extract identity-related features for identification. However, as shown in Figure 2(a), in the RGB-IR Re-ID task, the identity-related features often focus on invalid areas and ignore areas of the body (coloured clothes). This is because the large modality differences make colour and texture become invalid clues, which reduce the effective identity-related information in the image and make it easier for mixing identity-unrelated information (e.g. background information) into identity-related features. If we extract identity-unrelated features, on the contrary, identity-unrelated features focus on most of the identity-unrelated area as shown in Figure 2(b). There are two main reasons for this result: (1) Identity-unrelated information is far more than the identity-related information in an image in the RGB-IR Re-ID task. (2) Identity-unrelated information is widely distributed. Because of the above reasons, it is easier to extract pure identity-unrelated features. When extracting identity-unrelated features and identity-related features, part of identity-unrelated information appears in identity-related features and most of the identity-unrelated information is included in identity-unrelated features. Then, an orthogonal constraint is proposed between the identity-unrelated features and identity-related features to remove common identity-unrelated information. Finally, purer identityrelated features are obtained for identification as shown in Figure 2(c). We use a reverse-triplet loss to extract identity-unrelated features. Compared to the original triplet loss [13]: The reverse-triplet loss used to extract identity-unrelated features is as follows: Loss Tri−reverse = max d same a,n − d same a,p + α 1 , 0 where d same a,n and d same a,p are feature distances of negative and positive pairs with the same modality. d cross a,n and d cross a,p are feature distances of negative and positive pairs with different modalities. α 1 , α 2 is the margin constant of the reverse-triplet loss. TWCN calculates the reverse-triplet loss under the same modality or across modalities. The reverse-triplet loss extracts features that cannot distinguish identities, which requires features to meet: d a,n + α < d a,p . We call the features that satisfy the above formula as identity-unrelated features.
Generative adversarial nets (GAN) learning and singular vector decomposition [15] are commonly used for feature decoupling in Re-ID tasks. To decouple modality information, typical GAN learning methods [5] learn features from unified modality images that are generated from different modality images by the generator. However, the GAN methods have the disadvantages of slow convergence and a large amount of calculation. Orthogonal constraint based on singular vector decomposition [15] can perform singular value decomposition on a single matrix, so as to obtain the corresponding orthogonal matrix. In our network, identity-related features and identity-unrelated features are output in the form of two one-dimensional vectors, which is not suitable for methods based on singular vector decomposition. We try to design the loss function from the basic definition of orthogonal to achieve the purpose of orthogonal constraint. The formula for the angle between two vectors is as follows: are vectors and φ is the angle between − → a and − → b . Let loss = cos φ. When loss → 0, cos φ is also → 0. If cos φ = 0, then φ = π 2 , which means that the two vectors are orthogonal. We wisely use the convergence of the loss function in deep learning to design a loss function that can be used for orthogonal constraint. Based on the above considerations, a novel orthogonal constraint is proposed as follows: where b is the size of a mini batch, f unrelated i is the ith identity-unrelated feature embedding in a mini batch, and f related i is the corresponding identity-related feature embedding. We use the above orthogonal constraint to complete the decoupling of identity-unrelated information from identity-related features.

Correlation Coefficient Synergy and Central Clustering Loss:
In addition to the removal of identity-unrelated information, we propose a CC-SCC loss to extract identity-related features efficiently. The CCSCC loss consists of two items. The first item uses central features to promote cross-modality feature central clustering, and the second item uses correlation matrices [16] to synchronise feature extraction. The expression of CCSCC loss is as follows: where λ 1 and λ 2 are hyperparameters, c is the number of classes in a mini batch, m RGB i and m IR i are corresponding RGB/IR centre vector of the ith class in a mini batch. R RGB i and R IR i are corresponding RGB/IR feature embeddings correlation matrices of ith class in a mini batch. Given an identity-related feature embedding set X i , which is extracted from the ith class RGB images in a mini batch. X i = (X i1 , X i2 , X ik . . .), where X ik is the kth identity-related feature embedding of the ith class in a mini batch. The calculation of R RGB i can be expressed as follow:   The calculation of R IR i is similar to the above. The correlation matrices that represent the relationship between variables can be used to synchronise the convergence direction in CCSCC loss. Extensive experiments have been conducted to prove the CCSCC loss is effective.

Experiments and Results:
In order to prove the effectiveness of our method, ablation experiments are implemented in two popular crossmodality person re-identification datasets (SYSU-MM01 [1] and RegDB dataset [14]). The SYSU-MM01 dataset contains 287,628 RGB images and 15,792 infrared images. The RegDB dataset contains 4,120 RGB images and infrared images. All images are processed into the same size 144 × 288. In our experiments, the optimisation method is the stochastic gradient descent method. The experimental results are shown in Tables 1 and 2. In the RegDB dataset, compared with the baseline, TWCN has increased by 6.99% and 6.71% in rank-1 accuracy and mAP, respectively. In the SYSU-MM01 dataset, compared with the baseline, TWCN has respectively improved in rank-1 accuracy and mAP. Table 3 shows the performance comparison between TWCN and state-of-the-art methods.
Conclusion: We propose a novel and concise RGB-IR Re-ID network named two-way constraint network (TWCN). TWCN not only extracts and utilises identity-related features but also novelly makes full use of identity-unrelated features to improve the accuracy of the experiment. TWCN uses a reverse-triplet loss to extract identity-unrelated features, and proposes an orthogonal constraint to remove identity-unrelated information from identity-related features. In addition, a correlation coefficient synergy and central clustering (CCSCC) loss is introduced into TWCN to extract identity-related features effectively. Extensive experiments have proved our method is effective.