Surveillance video motion segmentation

Motion segmentation is the ﬁrst and important step of surveillance video summarisation, and traditional motion segmentation methods usually process all video data, which seriously affects the real-time performance of video synopsis. To address this issue, a novel method of surveillance video motion segmentation based on the progressive spatio-temporal tunnel (STT) ﬂow model is proposed in this letter. Unlike traditional video segmentation methods, the proposed one only analyses the pixels on the circular sampling lines of video frames. Initially, the circular progressive STT is established by sampling pixels progressively. Subsequently, the progressive STT is expanded to form an STT expansion diagram and the STT expansion diagram is modelled as progressive STT ﬂow. Finally, surveillance video motion fragments are segmented according to the progressive STT ﬂow model. Experimental results demonstrate that the proposed method outperforms the existing state-of-the-art methods in terms of time consumption. This is accomplished with a comparable motion segmentation precision.

✉ Email: zyz2016@stdu.edu.cn Motion segmentation is the first and important step of surveillance video summarisation, and traditional motion segmentation methods usually process all video data, which seriously affects the real-time performance of video synopsis. To address this issue, a novel method of surveillance video motion segmentation based on the progressive spatio-temporal tunnel (STT) flow model is proposed in this letter. Unlike traditional video segmentation methods, the proposed one only analyses the pixels on the circular sampling lines of video frames. Initially, the circular progressive STT is established by sampling pixels progressively. Subsequently, the progressive STT is expanded to form an STT expansion diagram and the STT expansion diagram is modelled as progressive STT flow. Finally, surveillance video motion fragments are segmented according to the progressive STT flow model. Experimental results demonstrate that the proposed method outperforms the existing state-of-the-art methods in terms of time consumption. This is accomplished with a comparable motion segmentation precision.
Introduction: With the rapid construction and development of smart cities, video data captured by surveillance cameras are increasing at an explosive rate. Browsing videos in an effective manner has become an urgent problem. Video summarisation, as an active research topic, enables users to browse massive surveillance videos effectively [1]. Motion segmentation is the first and important step of surveillance video summarisation. Therefore, how to effectively extract motion segments becomes particularly important.
Existing surveillance video motion segmentation methods are performed by determining whether there is motion in the frame. For example, Wang et al. [2] used the inter-frame feature difference method to determine whether the current frame contains moving targets. Murtaza et al. [3] computed the energy of motion history images, which provides spatio-temporal information of motion. The lower energy segments are static segments. Sheng et al. [4] proposed a new trajectory clustering method using submodular optimisation for motion segmentation. Stoffregen et al. [5] segmented the scene into multiple targets to achieve motion segmentation. Zhou et al. [6] proposed a novel multi-mutual consistency inducted transfer subspace learning framework for human motion segmentation. Guo et al. [7] improved the optical flow method to extract moving targets more accurately. These methods can achieve high segmentation accuracy, but they need to process the video frame by frame. And some deep learning-based segmentation methods require a large amount of sample data to train the network, which require extremely high-quality sample data and are relatively time consuming.
The traditional video temporal domain segmentation is to divide video into temporal structure units. Surveillance video motion segments segmentation aims to divide the video into motion segments and static segments, which is similar to traditional video temporal domain segmentation. Based on this observation, this letter extracts motion segments by detecting the boundaries between motion segments and static segments. It only needs to process the pixels on the circular sampling line of the video to construct the video progressive spatio-temporal tunnel (STT) flow model. The proposed method can greatly reduce the data processing capacity without reducing the accuracy of motion segments segmentation.
Proposed method: The present letter proposes a novel method for surveillance video motion segmentation. It is based on the analysis of the pixels on the circular sampling line, and the amount of calculation is very light. By expanding and processing the progressive STT, a progressive STT flow model is constructed to obtain motion segments. Figure 1 shows the flow chart of the proposed method.
Key technology: Video spatio-temporal slice, also known as visual rhythm, is an efficient video analysis method, which is widely used in video processing [8,9]. Existing video spatio-temporal slices are mainly divided into three types: horizontal slice, vertical slice, and diagonal slice, as shown in Figure 2a. When the target moves along the trajectory parallel to the sampling line, as shown in Figure 2b, the target point A will not leave any trace on the spatio-temporal slice.
1. Progressive STT: In order to detect motion inside the surveillance area, progressive circular sampling lines of all the frames are sampled and arranged on the time axis to form a progressive STT. Simultaneously, the surveillance area is divided into multiple sub-surveillance areas, as shown in Figure 3. It is almost impossible for the target to move in a standard circle, thus the target will definitely cross a sampling line, either moving in the surveillance areas or crossing the surveillance area. 2. Progressive STT expansion diagram: In order to get the trajectory and direction of the target, the STT needs to be expanded. Suppose the length of a video sequence isL. The pixel in the ith row and jth column of the kth frame can be expressed asP k i j . R and S are the radius and perimeter of circular sampling lines. And the centre is (centreX, centreY), the coordinates of pixel (i θ ,j θ ) on the circular sampling line are as follows: The circular sampling line with radiusR and the coordinates of each point are arranged from top to bottom to form a vector CR k whose length is the perimeter of the circular sampling line, as shown below: These vectors from successive frames are connected along the timeline to form the STT expansion diagram matrix Tunnel(R), as follows: i1, j1 P 2 i1, j1 · · · P k i1, j1 · · · P L i1, j1 P 1 i2, j2 P 2 i2, j2 · · · P k i2, j2 · · · P L i2, j2 · · · · · · · · · · · · · · · · · · P 1 iθ , jθ P 2 iθ , jθ · · · P k iθ , jθ · · · P L iθ , jθ · · · · · · · · · · · · · · · · · · P 1 iS , jS P 2 iS , jS · · · P k iS , jS · · · P L iS , jS 3. Progressive STT flow model: Motion segments are obtained by constructing the progressive STT flow model. As shown in Figure 3, a surveillance area can be divided into multiple sub-surveillance areas. The letter describes that, when there is no moving target in the first frame of the video, the target entering the sub-surveillance area is assigned positive spatio-temporal flow, and the target exiting the sub-surveillance area is assigned negative spatio-temporal flow, as shown in Figure 4.
Sub-sampling lines which are adjacent to the sampling line inside sub-surveillance areas are used to determine the motion direction. Each column of the STT expansion diagram corresponds to the sampling line at the same position of the video, and each column of the STT expansion diagram is regarded as an input of Gaussian mixture background model [10], as a result, the trajectory of motion in the STT expansion diagram is extracted. The target trajectory in the STT expansion diagram formed by the sub-sampling line can be obtained similarly. Comparing the sequence of target appearances in the two STT expansion diagrams, the moving direction of the targets can be obtained. After that, the proposed method detects the target centre, and sets the value of mpixels around the centre to 255. This standardises the moving target size. Figure 5 shows the expansion diagram and its processing.    The spatio-temporal flow can be calculated according to the number of white pixels in a single video frame, and the spatio-temporal flow of each STT expansion diagram is calculated as follows: Where, n represents the nth STT expansion diagram, k represents the kth frame of the video sequence, and the kth column of the STT expansion diagram, S represents the perimeter of the circular sampling line, and the height of the STT expansion diagram.p i,k represents the pixel in the ith row and kthcolumn of the STT expansion diagram. Suppose N is the number of sub-surveillance areas. The spatio-temporal flow F n (f k ) and the accumulative spatio-temporal flow AF n (f k ) of the nth subsurveillance region can be obtained by: Connecting the spatio-temporal flow and accumulative spatiotemporal flow of each frame, the spatio-temporal flow curve and accumulative spatio-temporal flow curve of the sub-surveillance area can be obtained, as shown in Figure 6 4. Motion segments: When the accumulative spatio-temporal flow is greater than 0, it means that there are moving targets in the sub-surveillance area. Through the above, the entire surveillance area is divided into N sub-surveillance areas, and the motion segment S(i) of the sub-surveillance area is obtained by the accumulative spatio-temporal flow curve. The motion segment S is as follows:   Figure 7 shows part of the experimental process of video1. It can be seen that when the car crosses the sampling line, the spatio-temporal flow varies with the car entering or exiting sub-surveillance areas. Table 1 shows the average precision, recall, F 1 and time consuming of test methods: Method in [2], Method in [3], Method in [7], the method presented here with one sampling circle (1C), two sampling circles (2C), and three sampling circles (3C). Figure 8 shows the comparison of precision and time consuming on each video. From Table 1 and Figure 8, it can be seen that the proposed method outperforms contrastive algorithms in terms of segmentation precision and F 1 . And the time consuming of the proposed method is much smaller than contrastive methods. Suppose the size of a video sequence is S × L, where S is the size of video frame, and L is the length of the video sequence. When the perimeters of circular sampling lines are, respectively, s 1 ,s 2 …s n , the computational complexity of the proposed method is O(L × (s 1 + s 2 + … + s n )); whereas O(L × S) is the computational complexity for the contrastive methods. Commonly, (s 1 + s 2 + … + s n ) < <S holds, thus the computational efficiency of the proposed method is higher than that of contrastive methods.
Different from the existing method of motion detection frame by frame, the paper proposes a new method of spatio-temporal slice, which greatly reduces the amount of data processing. Compared with the existing horizontal and vertical slices, the new circular sampling method proposed can better take into account the target movement in all directions. From the last three rows in Table 1, it can be seen that F 1 increases with the number of sampling circles. The main reason is that the progressive STT can better detect targets that only move inside the surveillance area.
Conclusion: The present letter proposes a novel method of surveillance video motion segmentation based on progressive STT flow model. The proposed method analyses the video through a small number of pixels on the circular sampling line, and constructs a progressive STT flow model to segment surveillance video motion segments. In future work, we will plan to perform motion classify on the motion segments.