CN-115410126-B - Data unbalanced weak supervision video anomaly detection method and system

CN115410126BCN 115410126 BCN115410126 BCN 115410126BCN-115410126-B

Abstract

The invention discloses a method and a system for detecting abnormal data unbalance weak supervision video, which comprise the steps of extracting video characteristics of video segment levels of a video to be detected, obtaining segment level abnormal scores of the video segment levels through an countermeasure training module, obtaining segment level abnormal scores of the video segment levels through a focusing training module, fusing the segment level abnormal scores with the abnormal scores to obtain fused abnormal scores, comparing the fused abnormal scores with a threshold value, and regarding all frames in the video segments larger than the threshold value as abnormal and regarding all frames in the video segments smaller than the threshold value as normal, so that the time positioning of abnormal data unbalance weak supervision video abnormal detection and abnormal is realized. According to the invention, the abnormal scores of the segment level obtained by the countermeasure training module and the focusing training module are fused, so that the difference between the normal video segment and the abnormal video segment is easier to learn, the abnormal event detection is more approximate to reality, and the video abnormal detection accuracy is further improved.

Inventors

HE PING
ZHANG FAN
LI HUIBIN

Assignees

西安交通大学

Dates

Publication Date: 20260508
Application Date: 20220829

Claims (10)

1. The data unbalanced weak supervision video anomaly detection method is characterized by comprising the following steps of: extracting video characteristics of video fragment level of the video to be detected; obtaining segment level anomaly scores through the countermeasure training module by using video features of the video segment level; Obtaining abnormal scores of the video segment level through a focusing training module by using video features of the video segment level; the focusing training module controls the weights of the abnormal video and the normal video by adding a coefficient based on the cross entropy loss function and adds a factor for controlling the weights of the difficult-to-abnormal and easy-to-abnormal videos, thereby realizing the space-time characteristics of the normal video And anomalous video features As input, then send it into a multi-scale time network to obtain corresponding multi-scale time features respectively And , Is a multi-scale temporal feature of normal video, Is a multi-scale time feature of the abnormal video, and is sent to a segment-level classifier Regression segment anomaly score, and weighting the anomaly score of average top-k video segment to obtain video level anomaly score; Fusing the segment level anomaly score obtained by the countermeasure training module with the segment level anomaly score obtained by the focusing training module to obtain a fused anomaly score; Comparing the fused anomaly score with a threshold value, and regarding all frames in the video segments larger than the threshold value as anomalies and regarding all frames in the video segments smaller than the threshold value as normal, thereby realizing data unbalanced weak supervision video anomaly detection and anomaly time positioning.
2. The method for detecting abnormal data unbalanced weak supervision video according to claim 1, wherein video characteristics of video clip level of the video to be detected are extracted through the I3D network.
3. The method for detecting abnormal data unbalanced weak supervision video according to claim 1, wherein extracting video features of video clip level of the video to be detected specifically comprises the following steps: each video is displayed Is divided into Successive and non-overlapping time segments , For the first time period of the time period, For the N-th time slice, video All of the segments in (a) are tagged according to video level Forming a positive packet or a negative packet, the positive packet representing the video At least one abnormal time segment, and negative packets represent video All time slices in (a) are normal; Then with each video As input to the I3D network, the output of the I3D network is primitive in time-space characteristics Each video Corresponding to an original space-time characteristic Thereby completing the extraction of the video features, wherein, For the number of video clips, For the dimension of each segment feature, For the first time segment The corresponding original spatio-temporal characteristics are used, For the N-th time segment Corresponding original space-time characteristics, R is N Matrix vectors in D dimensions.
4. The method for detecting abnormal data unbalanced weak surveillance video according to claim 1, wherein the total loss of the countermeasure training module is as follows: (11) in the formula, To combat the overall loss function of the training module, To adjust the parameters of the weights between clean video loss and counter sample loss, As a function of the total loss of clean video, To combat the total loss function of the sample.
5. The method for detecting anomalies in a data-unbalanced weakly-supervised video as claimed in claim 4, wherein the total loss function of the clean video is as follows: (9) in the formula, As a function of the total loss of clean video, , And Respectively, are different weight parameters, For a clean video of the hinge loss, For the sparse loss of clean video, Is a smooth loss of clean video.
6. The method for detecting anomalies in a data-unbalanced weakly-supervised video of claim 4, wherein the total loss function for the challenge samples is as follows: (10) in the formula, In order to combat the overall loss function of the sample, , And Respectively, are different weight parameters, In order to combat the hinge loss of the sample, In order to combat the sparse loss of the samples, To combat the loss of smoothness of the sample To combat cross entropy loss of samples.
7. The method for detecting abnormal data unbalanced weak surveillance video of claim 4, wherein the loss function of the focus training module is as follows: (13) in the formula, For the total loss function of the focus training module, , And Respectively, are different weight parameters, For a clean video of the hinge loss, For the sparse loss of clean video, For a smooth loss of clean video, Is sensitive to losses at the cost.
8. The method for detecting abnormal data unbalanced weak surveillance video of claim 7, wherein the cost sensitive loss function is as follows: (12) in the formula, In order to be sensitive to the loss of cost, To control the weight coefficients of the abnormal video and normal video loss ratio, To control the weight factor of the video loss proportion difficult to be abnormal and easy to be abnormal.
9. The method for detecting abnormal data unbalance weak surveillance video of claim 1, wherein the fused abnormal score is calculated by the following formula, ,otherwise (14) Where c is a decision threshold between the countermeasure training and focus training anomaly scores, Is the anomaly score after the fusion, To obtain video clip level anomaly scores through the countermeasure training module, Is an anomaly score at the video clip level obtained by the focus training module.
10. A data unbalanced weak supervision video anomaly detection system, comprising: The video feature extraction module is used for extracting video features of video segment levels of the video to be detected and sending the video features to the segment level abnormal score acquisition module and the segment level abnormal score acquisition module; The segment level abnormal score obtaining module is used for obtaining segment level abnormal scores of video features of the video segment level through the countermeasure training module and sending the segment level abnormal scores to the fusion module; the abnormal score obtaining module of the segment level is used for obtaining the abnormal score of the segment level from the video characteristics of the video segment level through the focusing training module and sending the abnormal score to the fusion module; the fusion module is used for fusing the segment level abnormal score obtained by the countermeasure training module and the segment level abnormal score obtained by the focusing training module to obtain a fused abnormal score, and sending the fused abnormal score to the comparison module; The focusing training module controls the weights of the abnormal video and the normal video by adding a coefficient based on the cross entropy loss function and adds a factor for controlling the weights of the difficult-to-abnormality and easy-to-abnormality videos, thereby realizing the space-time characteristics of the normal video And anomalous video features As input, then send it into a multi-scale time network to obtain corresponding multi-scale time features respectively And , Is a multi-scale temporal feature of normal video, Is a multi-scale time feature of the abnormal video, and is sent to a segment-level classifier Regression segment anomaly score, and weighting the anomaly score of average top-k video segment to obtain video level anomaly score; And the comparison module is used for comparing the fused anomaly score with a threshold value, and regarding all frames in the video segments larger than the threshold value as anomalies and regarding all frames in the video segments smaller than the threshold value as normal, so as to realize the data unbalanced weak supervision video anomaly detection and the anomaly time positioning.

Description

Data unbalanced weak supervision video anomaly detection method and system Technical Field The invention belongs to the technical field of video processing, and particularly relates to a method and a system for detecting data unbalance weak supervision video anomalies. Background The video anomaly detection technology is applied to autonomous monitoring and is an important method for monitoring the occurrence of an anomaly event. Video anomalies generally refer to the appearance or movement properties of an abnormal appearance or movement properties in a video, or the appearance or movement properties of an abnormal temporal or spatial appearance. The video anomaly detection task is to detect the temporal and spatial anomalies existing in the video. The prior art is generally divided into two types according to the presence or absence of labels of sample data, one is based on an unsupervised video anomaly detection technology, and such methods generally model only normal sample distribution, and video frames or video clips deviating from the normal sample distribution are regarded as anomalies during testing. Another is a weakly supervised video anomaly detection technique, which uses only video-level labels during training, typically seen as a standard multi-instance learning problem, where each video can be seen as a packet and each segment in the video can be seen as an instance, marking the video as anomalous whenever at least one frame of anomaly is detected in the video, and otherwise as normal. Although a great deal of advanced weak surveillance video anomaly detection technologies exist at present, the weak surveillance video anomaly detection tasks are all implicitly built on the basis of unbalance of normal anomaly frames in a video, and more attention is paid to how to find representative video clips or how to acquire timing relationships among video clips, so that a very important problem is ignored, namely, the inherent data unbalance phenomenon between normal video and abnormal video. For example, the common SHANGHAITECH dataset contains 238 videos in total during training, with 175 normal videos and 63 abnormal videos, which is approximately one third of the number of normal videos, while in real life the number of abnormal videos is much smaller than the number of normal videos. If the SHANGHAITECH dataset weakly considers the unbalance of the number between the normal video and the abnormal video in the training process, the common reference datasets UCF-Crime and XD-Violence set the number of the normal video and the abnormal video to be close to 1:1 in the training process, which also potentially causes the current weak supervision video abnormality detection task to ignore the inherent unbalance phenomenon between videos. With the rapid increase of the number of monitoring cameras in recent years, the problem of abnormality detection in video monitoring is receiving more and more attention. Because of the problems of difficult labeling and the like caused by massive monitoring videos, the video-level labels are relatively easier to obtain than the frame-level labels, and therefore, a weak supervision method for labeling only the video-level labels becomes an important means for realizing anomaly detection. However, the implicit establishment of the conventional weak supervision video anomaly detection task is based on unbalance of normal anomaly frames in the video, and more attention is paid to how to find representative video segments or how to acquire the time relationship between video segments, but an important problem is neglected, namely, the inherent data unbalance phenomenon between the normal video and the anomaly video, so that the detection effect of the anomaly event is affected. Disclosure of Invention In order to overcome the problems in the prior art, the invention aims to provide a data unbalanced weak supervision video anomaly detection method and a system, which can strengthen the quantity of anomaly videos and focus attention, fuse the anomaly videos and accurately realize the detection effect on the anomaly event at the video frame level. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: A data unbalanced weak supervision video anomaly detection method comprises the following steps: extracting video characteristics of video fragment level of the video to be detected; obtaining segment level anomaly scores through the countermeasure training module by using video features of the video segment level; Obtaining abnormal scores of the video segment level through a focusing training module by using video features of the video segment level; Fusing the segment level anomaly score obtained by the countermeasure training module with the segment level anomaly score obtained by the focusing training module to obtain a fused anomaly score; Comparing the fused anomaly score with a threshold value, and regarding all frames in the