Gait recognition based on sparse linear subspace

Gait recognition has broad application prospects in intelligent security monitoring. How-ever, due to the variability of human walking states and the complexity of external conditions during sample collection, gait recognition is still facing many challenges. Among them, gait recognition algorithms based on shallow learning are hard to achieve the correct recognition rate required by many applications, while the amount of gait training data cannot meet the needs of model training based on deep learning. To solve the above problem, this paper presents a novel gait recognition scheme based on sparse linear subspace. First, frame-by-frame gait energy images (ffGEIs) are extracted as primary gait features and sparse linear subspace technology is used to represent them for dimension reduction. Second, a new gait classiﬁcation algorithm based on support vector machine is presented, which adopts Gaussian radial basis function (RBF) kernels to achieve cross-view gait recognition. Finally, the proposed gait recognition approach is evaluated on two open-accessed gait databases to demonstrate its performance.


INTRODUCTION
Human gait refers to the posture changes of the whole body, especially the legs, in the process of walking. In the last decade, gait recognition and its application have received extensive attention [1,2]. Compared with other biological features, such as face, fingerprint, iris etc., gait has the following advantages: (1) Easy to identify from a long distance. Because gait features are more dynamic, it can achieve higher recognition rate in a longer distance and lower resolution. (2) Non-invasiveness to the identified object. Because the recognition process does not need the deliberate cooperation of the identified object, and does not need to contact the identified person, it can be easily used for identification in public places. (3) Difficult to hide. Because the gait features include the whole body posture changes in the process of walking, it is difficult to cover by wearing mask and gloves like face and fingerprint. However, due to the variability of human walking state and the complexity of external conditions in the process of data acquisition, gait recognition still faces many challenges. [3,4]. For example, because of the change of the relative relationship between the walking direction and the camera axis, there are This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology significant differences in the walking posture samples, which results in the cross-view gait recognition problem. Another example is that it is difficult to extract the essential gait features with strong distinguishing ability and the time complexity of gait recognition algorithm is high because the gait image is directly used as feature. In addition, due to the need to consider more complex factors to extract the complete contour information of pedestrians from real scenes, there is a certain gap in applying gait recognition algorithms into video surveillance and other practical applications.
To extract intrinsic gait features with strong distinguishing ability and reduce time complexity of gait recognition algorithms, we present a novel gait recognition scheme on sparse linear subspace. Firstly, we extract frame-by-frame gait energy images (ffGEIs) as primary gait features and use sparse linear subspace technology to represent them for dimension reduction. Secondly, a new gait classification algorithm based on support vector machine is presented, which adopts Gaussian radial basis function (RBF) kernels to achieve cross-view gait recognition. Finally, to evaluate the proposed method, we conducted several experiments on two well-known gait databases. The experimental results show that our method outperforms IET Image Process. 2021;1-9.
wileyonlinelibrary.com/iet-ipr existing gait recognition methods in terms of correct recognition rates. This paper is structured as follows. In Section 2, we review and discuss the existing gait recognition methods. Then, in Section 3, a novel gait recognition method based on spare linear subspace is proposed. Next, in Section 4, we describe two experiments to demonstrate the performance of the proposed method. Finally, Section 5 concludes the work presented.

RELATED WORK
In terms of gait recognition, a series of research results [5,6] have been achieved. In this section, we will review and discuss them according to the timeline. First of all, in [7], the famous gait representation, named gait energy image (GEI), was proposed, which can comprehensively reflect the time and space characteristics of people walking in a gait cycle. GEI and some of its variants have been widely used in gait recognition research. Zhang et al. [8] proposed an algorithm to learn high-resolution counterparts of test images from a collection of training images to resolve the problem of gait recognition when only low-resolution gait samples are available. Huang et al. [9] combined two gait representations, the shifted energy image and the gait structural profile, to improve the robustness of gait recognition to some classes of structural variations. Xu et al. [10] proposed a patch distribution feature for human gait recognition, which represented GEIs as a set of local augmented Gabor features and learned a global Gaussian mixture model. In [11], by utilizing hidden Markov models and dual discriminative observations, Boulgouris and Huang presented a gait recognition approach which combines holistic and model-based features. Based on exploring relations between two gaits from two different views, Kusakunniran et al. [12] proposed a gait recognition method which carried out motion clustering to partition the most related parts of gaits from different views into the same group. Lai et al. [13] utilized the matrix representation-based human gait recognition and proposed a discriminant subspace learning method, which extended the matrix-representation-based discriminant analysis methods to sparse cases. Li et al. [14] expanded dictionary learning into the multitemporal recovery of quantitative data contaminated by thick clouds and shadows, which could be used in gait recognition. In [15], to solve the problems caused by cycle detection failures and inter-cycle phase misalignment, an accelerometer-based gait recognition approach was proposed, which used a type of salient points termed signature points. Guan et al. [16] modelled the effect of covariate factors such as carrying condition, clothing changing etc., as an unknown partial feature corruption problem and used a classifier ensemble approach to deal with this problem.
To solve the problem of large intra-class differences of gait samples when the perspective span is large, Muramatsu et al. [17] combined view transformation models (VTMs) with a score normalization framework with quality measures to address the problem of cross-view gait recognition. Wang et al. [18] employed continuous density hidden Markov mod-els (CD-HMM) to perform gait recognition, and then used an adaptive algorithm to adaptively refine the parameters of each model. Tang et al. [19] proposed a method of gait partial similarity matching by assuming a 3D object shares common view surfaces in significantly different views. Chhatrala et al. [20] studied human gait representation based on the Gabor function and discrete cosine transform of binary silhouettes, which was enhanced using a multilinear Laplacian discriminant analysis. Alotaibi and Mahmood [21] developed a specialized deep CNN architecture for gait recognition, which was less sensitive to several cases of the common variations and could handle relatively small data sets without using any augmentation or fine-tuning techniques. Xu et al. [22] proposed a coupled locality preserving projections (CLPP) method for cross-view gait recognition, which learned coupled projection matrices to project cross-view features into a unified subspace while preserving the essential manifold structure. In [23], a subspace ensemble learning using totally-corrective boosting framework and its kernel extension were proposed for gait recognition, in which multiple subspaces are iteratively learned with different weight distributions on the triplet set using totally-corrective technology. Deng et al. [24] proposed a gait recognition algorithm by combining spatial-temporal and kinematic gait features. In [25], a Gabor wavelets-based gait recognition algorithm was presented, which utilized a two-dimensional principal component analysis method to reduce the extracted gait features. In [26], a multi-shot gait recognition algorithm based on twosided Fourier correction motion point estimation was proposed, which was based on temporal-spatial HOG feature template matching.
More recently, Wang et al. [27] presented a gait recognition method based on a self-adaptive hidden Markov model. Deng and Wang [28] used deterministic learning theory to capture the gait dynamics underlying Kinect-based gait parameters. In [29], a gait-related loss function, called angle center loss (ACL) was proposed, to learn discriminative gait features. Ben et al. [30] presented a cross-view gait recognition method by aligning gait energy images across views with the coupled bilinear discriminant projection, which alleviated the problem that gait features are more affected-prone by views than identities, especially when there is a significant difference in walking direction between training gait and test gait. In [31] proposed a spiderweb graph neural network (SpiderNet) to solve the cross-view gait recognition problem, which connected the gait data of single view with that of other views concurrently and constructed an active graph convolutional neural network. Wang and Yan [3] put forward a gait recognition method based on ensemble learning.
To sum up, existing gait recognition researches, especially those using deep learning technology, have made great progress in gait feature representation and pre-processing, classifier design and training. However, although a serial of gait recognition methods has been proposed to date, gait recognition has a certain distance for use in practical applications. One of the important reasons is that the gait recognition method based on shallow learning is difficult to achieve the correct recognition rate required by practical application, and the amount FIGURE 1 Flow chart of the propose gait recognition scheme of gait training data usually cannot meet the needs of deep learning based model training. To solve the above problem, we present a novel gait recognition scheme on sparse linear subspace to extract more discriminative gait feature by nonlinear dimensionality reduction. Firstly, we extract frame-byframe gait energy images (ffGEIs) as primary gait features and use sparse linear subspace technology to represent them for dimension reduction. Then, a new gait classification algorithm based on support vector machine is presented, which adopts Gaussian radial basis function (RBF) kernels to achieve cross-view gait recognition. Finally, to evaluate the proposed scheme, we conducted several experiments on two well-known gait datasets, that is CASIA Dataset B and OU-ISIR LP dataset.

METHODOLOGY
Generally, the process of gait classification consists of two stages: gait feature extraction and classifier design. We propose a novel gait classification approach based on manifold learning, as shown in Figure 1. In our method, the gait feature extraction process includes two sequential steps, namely construction of primary gait features and dimension reduction based on sparse linear subspace, while the classifier design stage is further divided into RBF-SVM training and RBF-SVM-based gait classification. In this section, we will discuss each step of gait classification one by one.

Construction of primary gait features
Frame-by-frame gait energy image (ffGEI) is a comprehensive spatiotemporal representation of human gait, which can expand the available dataset and relax the constraints of gait cycle segmentation required by existing gait representations [32]. As a variation of conventional GEI [7], ff-GEI has the attributes of basic GEI, that is it is less sensitive to silhouette noises than the original representation of silhouette images. Besides, compared with conventional GEI, ff-GEI only adds historical information to each frame in the original silhouette sequence and almost keep the number of gait training samples, thus, it is more suit-able for training deep neural networks which usually need massive data to ensure the accuracy of training results.
To construct an ffGEI, as shown in Figure 2, gait silhouette images are processed using three sequential steps: (1) Silhouette Extraction. Segment the silhouette region of each visual object from its background and convert it into a binary image; (2) Normalization Processing. Resize the image size into a standard scale and align each silhouette region at the centre of the final images; (3) FfGEI Construction. Combine all the images in a given time span and obtain an ffGEI, as follows: where I i (⋅) is the i-th normalized silhouette image, (u, v) is the coordinate value in normalized silhouette images and ffGEI images, and 2m+1 is the time span in ffGEI construction.

Dimension reduction based on sparse linear subspace
Subspace analysis is an effective method to reduce the dimension and find features by projecting the sample to an optimal subspace. The basic starting point is to project the loosely distributed samples in the high-dimensional space into the lowdimensional subspace through linear or non-linear transformation, so that the samples in the low-dimensional subspace are more closely distributed and separable, and the computational complexity is reduced.
Suppose we have two sets of gait images of the same person, and these two sets can represent different internal and external variables. The samples in the first set represent different acquisition perspectives, while the samples in the second set represent different clothes and load conditions. In order to reduce the dimension of the extracted ffGEIs, this paper first uses the sparse approximation to extract the local linear subspace from  [33].
For each reference point, we use a set of points in the second set to approximate its nearest neighbour on the manifold, as shown in Figure 3. Instead of searching for all points, we use a joint sparse approximation to solve the search problem. The distance between the nearest set of points in the second set and the corresponding reference points in the first set is used to represent the distance of the two sets. Assuming that R1 and R2 are two m-dimensional subspaces, the Frobenius norm-based distance between them can be defined as: where B 1 andB 2 are the orthogonal basis of R 1 and R 2 , respectively,F tags the Frobenius norm. It can be proved that the norm is orthogonal invariant. Frobenius norm [34] is generally defined as: where M and N are the number of rows and columns of matrix A respectively, and a i, j is the element at the i-th row and j -th column of matrix A. Given a gait feature set G = [g 1 , g 2 , … g L ], where each element represents a vector by expanding a ffGEI using row first, we can construct K subspaces with rank m, and K is calculated by: As for each element g i (1 =< i <= L) in G , we use the other samples in G to reconstruct its coefficient samples and utilize Equation 3 to calculate the reconstruction error between g i and subspace R i . By minimizing the distance represented by Equation 2, a suitable linear subspace is found to achieve dimensionality reduction of gait features, which includes the following four steps: 1) The input gait training data are divided into normal gait data set and abnormal gait datasets. Among them, 'normal gait dataset' refers to the gait data collected from individuals without wearing a coat that seriously blocks the leg movement information or carrying a backpack that seriously affects the human contour information. Otherwise, we call it 'abnormal gait dataset'. 2) According to the correlation of data sources and the perspective of data acquisition, normal gait data sets and abnormal gait data sets are grouped respectively, and several local linear subspaces are extracted respectively. 3) For each local linear subspace of abnormal gait data set, the average distance between abnormal gait data set and each local linear subspace of normal gait data set is calculated by Equation 2. 4) The target subspace is obtained by using the idea similar to knearest neighbour method. Specifically, the average distance obtained in the previous step is sorted from small to large, and the linear subspace corresponding to the first k average distances from the normal gait data set is taken as the target subspace k is set to 5 in our comparative experiments.

RBF-SVM -based gait classification
SVM is defined as the linear classifier with the largest interval in the feature space, which maximizes the interval by utilizing some learning strategy. SVM can transform classification problem into a solution to a convex quadratic programming problem. The basic SVM classifier is a two-class classification model. When using SVM to deal with gait classification problems, we need to construct a suitable multi-class classifier. One-versusone strategy is used in our method. The basic idea is to design an SVM between any two types of samples, so samples of k categories need to design k(k − 1)∕2 SVMs. When classifying an unknown sample, the category with the most votes last is the category of the unknown sample. On the other hand, for nonlinear classification problems, the SVM processing method is to choose a kernel function k(⋅) to solve the problem of linear inseparability in the original space by mapping the data to highdimensional space. The kernel function calculates the function of the inner product of two vectors in the space after implicit mapping. The advantage of the kernel function is that although it maps features from low-dimensional to high-dimensional, it computes in advance on low-dimensional, and the substantive classification effect is expressed in high-dimensional. Thus, kernel function can avoid carrying out complex calculations in high-dimensional space. After constructing the ffGEIs and using manifold learning technology to reduce their dimension, we use Gaussian RBF-SVM classifiers to achieve gait classification. The Gaussian RBF kernel is defined as: where g i and g j are two gait feature vectors from given set G , and is a width parameter. The main advantages of using RBF kernel-based SVM in gait classification include: (1) Experience shows that the distribution of SVM based on RBF kernel function is stable in mapping space, so after normalization, the average distance between the centroids of the mapped gait samples is the shortest; and (2) RBF kernel function is a typical local kernel function, so its non-linear output mapping can be approximated locally by using a local exponential decay non-linear function (such as the Gaussian function). Furthermore, as loss function selection is an important part of SVM classifier training and SVM-based classification process, which will have a great impact on the classification results. We use the Hinge loss function in our method, as defined by: where s y i represents the score corresponding to the real gait class, s j is the score for other categories.

EXPERIMENTAL RESULTS
In this section, we conduct numerous comparative experiments to evaluate the proposed methods. We used Python 3.6 programming language and scikit-learn 0.22 Toolkit to implement all the experiment on Dell precision T7820 with two 5220R CPU and 256G memory. In order to quantitatively verify the classification effect of the proposed methods, we used the cumulative match characteristic (CMC) curve to compare the performance of different gait recognition methods. CMC is a popular emulation criterion in biometric systems where recognition performance is measured based on the relative ordering of match scores corresponding to each biometric sample. Besides, in these experiments, there were seven methods involved in the comparison, including a traditional method and several other ones, that is Original GEI [7], VTM [17], CDHMM [18], (2D) 2 PCA [25], DCNN [5], convLSTM [32]. Han et al. [7] proposed the spatio-temporal gait representation, namely GEI, and demonstrated its effectiveness in gait recognition. Muramatsu et al. [17] presented the view transformation model (VTM) that encodes a joint subspace of multi-view gait features. Wang and Yan [18] employed continuous density hidden Markov models (CD-HMM) to perform gait recognition and proposed an adaptive algorithm to adjust parameters of each gait models. Wang et al. [25] proposed a Gabor wavelets based gait recognition algorithm, which utilized a twodimensional principal component analysis ((2D) 2 PCA) process to reduce the dimensionality of gait feature spaces. Wu et al. [5] studied gait based human identification via similarity learning by deep convolutional neural networks and proposed the famous framework Local @ Bottom (LB). Wang et al. [32] presented gait recognition methods based on convolutional long short term memory.
Compared with the methods based on shallow learning, the correct recognition rate of the proposed method is significantly improved, reaching or approaching the gait recognition method based on deep learning. It has a strong practical value in the application scenarios where the number of training samples is small and the depth model cannot be fully trained. However, due to its large complexity, the proposed method is hard to achieve real-time recognition, and is more suitable for gait-based pedestrian recognition from a large number of video files.

Experiments on CASIA Dataset B
These experiments were conducted on CASIA Dataset B [35] which is a multi-view dataset collected by the Institute of Automation of Chinese Academy of Sciences in January 2005. There are totally 124 persons involved in CASIA Dataset B, and the gait samples were captured synchronously from 11 different views. In addition, three variation conditions, namely view angle, clothing and carrying changes, are separately considered. Figure 4 shows eleven view angle changes and Figure 5 shows three variation conditions in CASIA Dataset B respectively. Figure 6 shows the CMC curves , and Table 1 shows the Rank 1, Rank 5 and Rank 10 correct recognition rates of these experiments. In this experiment, all the gait samples in CASIA Dataset B are used, which are grouped into training set and test set by the ratio of 8:2. All the variation conditions, that is view angle, clothing and carrying changes, are considered.
In Figure 6 and Table 1, we can see that the proposed method greatly outperforms other traditional methods [17,18,25,35] in terms of correct recognition rates. Compared with deep learning-based methods [5,32], when the Rank value is small, its correct recognition rates are slightly lower than those of stateof-the-art method. When the rank value exceeds 10, the correct recognition rates reach those of the deep learning-based methods. This is mainly due to that ML-based dimension reduction can extract more differentiated gait characteristics from the traditional gait representations and thus improve the correct recognition rate.

Experiments on OU-ISIR large population dataset
These experiments were conducted on OU-ISIR Large Population Dataset [36] which is collected by since March 2009 through outreach activity events in Japan. The data set consists of persons walking on the ground surrounded by the 2 cameras at 30 fps, 640 by 480 pixels. OU-ISIR Large Population Dataset totally includes 4016 subjects with ages ranging from 1 to 94 years. Figure 7 shows an example from OU-ISIR Large Population Dataset. Figure 8 shows the CMC curves, and Table 2 show the Rank 1, Rank 5 and Rank 10 correct recognition rates of these experiments. In these experiments, because OU-ISIR Large Population Dataset comprises two main subsets, A and B, we directly use the subset A as training set and subset B as test set. As shown in Figure 8 and Table 1, compared with traditional methods [17,18,25,35], the proposed method works better with respect to correct recognition rates. In addition, when the rank value is large, even compared with the state of the art methods using deep learning technology, the proposed method is not inferior. The main reason of this result is that the proposed method makes the most use of manifold learning technology to explore the best gait features that can reflect the essential difference in gait between different people.
Furthermore, from Figures 6 and 8, we also can see that all the seven methods perform better on OU-ISIR Large Population Dataset than on CASIA Dataset B. This is partially because the gait samples in CASIA Dataset B include more complex influencing factors than those in OU-ISIR Large Population Dataset, such as view angle changes and different carrying condition.

Supplementary experiments
In this section, we conducted supplementary experiments on CASIA Dataset A to verify the performance of ffGEIs. Three methods, that is (2D) 2 PCA [25], convLSTM [32] and the proposed method are compared by inputting ffGEIs and traditional GEIs separately. The Dataset A was created on 10 December 2001, which includes 240 image sequences corresponding to 20 objects, three walking lanes and four different conditions. Each object has 12 image sequences, 4 sequences for each of the three directions, that is parallel, 45 • and 90 • to sampling directions. The length of each sequence is not identical for the variation of each walker's speed, but it must ranges from 37 to 127. Two image sequences in Dataset A are selected as training set, and the other sequences are treated as testing set. Figure 9 shows the CMC curves of three methods. As can be seen from

FIGURE 9
CMCs of three methods on CASIA Dataset A the Figure 9, the correct recognition rates of three methods have increased after using ffGEIs as their primary gait features, but the improvement of the proposed method is more obvious. Besides, we also conducted supplementary experiments on CASIA Dataset B to verify the performance of two methods on three test sets with different conditions. In these experiments, four gait sample sequences with normal conditions in CASIA Dataset B are used as training set, and the other two sample sequences with normal conditions, two sample sequences with clothing and two sample sequences carrying bags, are used as testing set separately. As shown in Figure 10, it can be seen that the proposed method outperforms the (2D) 2 PCA [25] method with the same conditions. This is due to the fact that the proposed method extracts gait features with stronger discrimination ability through non-linear dimensionality reduction.

CONCLUSION
We propose a novel gait classification method based on sparse linear subspace, which consists of two steps: (1) primary gait feature construction and dimension reduction, and (2) gait classification. In the first step, we make the most use of manifold learning techniques to reduce the dimension of the primary gait features, that is ffGEIs, and obtain a serial of low-dimensional gait feature vectors. Then, in the gait classification step, we construct and train RBF-based SVMs to carry out cross-view gait classification. Experiments on two well-known gait databases demonstrate the proposed gait classification method in this work outperforms existing methods. One of the limitation of the proposed method is that only gait datasets with simple background based on laboratory environment are considered, which have complete and easily extracted gait silhouettes and trajectory information. How to separate complete gait silhouette information from gait video with complex background in real environment is one of our future research work.