CN-121999533-A - Method, device, equipment and medium for identifying abnormal behaviors of person crossing boundary

CN121999533ACN 121999533 ACN121999533 ACN 121999533ACN-121999533-A

Abstract

The invention provides a method, a device, equipment and a medium for identifying abnormal behaviors of people crossing boundary, which comprise the steps of collecting T-frame video images based on a target monitoring video, dividing each frame of video images into a plurality of independent image blocks, converting the independent image blocks into image block feature vectors, generating a boundary distance map based on each frame of video images and a space boundary model, encoding the generated boundary distance map into boundary perception position embedded vectors consistent with the dimensions of the image block feature vectors, adding the boundary perception position embedded vectors and original space-time position codes corresponding to each frame of image block to obtain enhanced position codes of fused boundary information, obtaining feature maps of enhanced space-related information by utilizing a space attention branch in a Divided Attention structure by the image block feature vectors of the enhanced space-related information, obtaining features of fused time sequences and motion information by utilizing a time attention branch in a Divided Attention structure, and identifying abnormal behaviors of crossing boundary by the space-time fusion features based on the abnormal behavior identification model.

Inventors

XU MENGJIA
LI SI
LI PEIJI

Assignees

上海东普信息科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (10)

1. The method for identifying the abnormal behaviors of people crossing the border is characterized by comprising the following steps: Acquiring a target monitoring video, acquiring T-frame video images based on the target monitoring video, dividing each frame of video image into a plurality of independent image blocks, and converting each independent image block into an image block feature vector; The method comprises the steps of constructing a space boundary model of an electronic fence, generating a boundary distance map based on each frame of video image and the space boundary model, encoding the generated boundary distance map into a boundary perception position embedded vector consistent with the feature vector dimension of an image block, and adding the boundary perception position embedded vector and an original space-time position code corresponding to each frame of image block to obtain an enhanced position code of fused boundary information; Obtaining an image block feature vector fused with the enhanced position code based on the enhanced position code and the image block feature vector; The feature vector of the image block with the fusion enhancement type position coding is utilized to obtain a feature map of enhanced space association information by utilizing a space attention branch in a Divided Attention structure; The characteristic diagram of the enhanced spatial association information utilizes a time attention branch in Divided Attention structures to obtain the characteristics of the fusion time sequence and the motion information; splicing the feature map of the enhanced spatial association information, the fusion time sequence and the features of the motion information to obtain space-time fusion features; the spatiotemporal fusion feature identifies out-of-range abnormal behavior based on the abnormal behavior identification model.
2. The method for identifying abnormal behavior of person out of range according to claim 1, wherein the acquiring the target surveillance video, acquiring T frames of video images based on the target surveillance video, dividing each frame of video image into a plurality of independent image blocks, and converting each independent image block into an image block feature vector, comprises: acquiring a target monitoring video, and acquiring a T-frame video image by adopting a dynamic sampling strategy; Uniformly dividing each frame of video image into a plurality of independent image blocks according to a preset size, and ensuring that the pixel range of each image block is not overlapped and completely covers the whole frame of image; and converting each independent image block into a feature vector with a fixed dimension through linear projection operation, so as to realize the mapping of the image space information to the vector space.
3. The method for identifying abnormal behaviors of person out of range according to claim 2, wherein the dynamic sampling strategy comprises the steps of adaptively adjusting a sampling interval according to the intensity of target motion in a target monitoring video, shortening the sampling interval when the target motion frequency is larger than a preset value, and increasing the sampling interval when the target motion frequency is smaller than the preset value.
4. The method for identifying abnormal behaviors of people crossing border according to claim 1, wherein the steps of constructing a space boundary model of an electronic fence, generating a boundary distance map based on each frame of video image and the space boundary model, encoding the generated boundary distance map into a boundary sensing position embedded vector consistent with the feature vector dimension of an image block, adding the boundary sensing position embedded vector and an original space-time position code corresponding to each frame of image block to obtain an enhanced position code of fused boundary information, and the method comprises the following steps: Acquiring area parameters of a preset electronic fence, wherein the area parameters of the preset electronic fence comprise polygon vertex coordinates of an area of the preset electronic fence; Calculating the shortest Euclidean distance from each pixel point in each frame of image to the boundary of the electronic fence, generating a gray value of the corresponding pixel point according to the size of the distance value, and finally forming a boundary distance map of the whole frame of image, wherein the boundary distance map is used for reflecting the spatial relationship between each position of the image and the boundary of the fence; encoding the boundary distance map into a boundary perception position embedded vector consistent with the feature vector dimension of the image block by utilizing a leachable mapping function; calculating an original space-time position code corresponding to each frame of image block, wherein the space position code is determined by coordinates of the image block in the frame; And carrying out element addition on the boundary perception position embedded vector and the original space-time position code corresponding to the image block to obtain the enhanced position code fusing the boundary information.
5. The method for identifying abnormal behaviors of person out of range according to claim 1, wherein the merging feature vectors of the image blocks of the enhanced position code to obtain feature graphs of enhanced spatial correlation information by using a spatial attention branch in Divided Attention structures comprises: Splitting the feature vectors of the image blocks subjected to fusion enhancement type position coding according to the frame dimension by utilizing a spatial attention branch in a Divided Attention structure to obtain feature sets of all the image blocks in a single frame; Calculating the similarity between each image block feature and other image block features in the feature set of all image blocks in a single frame, normalizing the similarity through a Softmax function to obtain a spatial attention weight matrix, and then carrying out weighted summation on the spatial attention weight matrix and the feature set of all image blocks in the single frame to output a feature map for strengthening spatial association information.
6. The method for identifying abnormal behaviors of crossing a person according to claim 1, wherein the splicing the feature map of the enhanced spatial association information, the fusion time sequence and the feature of the motion information to obtain the space-time fusion feature comprises: Splitting a feature map of the enhanced spatial association information according to the dimension of the image block by utilizing a time attention branch in Divided Attention structures to obtain a time sequence feature set of a single image block in a T frame sequence; Calculating gray level change difference of pixel points corresponding to adjacent frames, and approximately acquiring motion information of a target; Taking the motion intensity value as a modulation factor, performing multiplicative interaction with a time attention weight matrix to obtain a time attention weight guided by motion; and carrying out weighted summation on the time attention weight guided by the motion and a time sequence feature set of a single image block in the T frame sequence to obtain the feature of fusion time sequence and motion information.
7. The method for identifying abnormal behaviors of person out of range according to claim 1, wherein the spatiotemporal fusion feature identifies abnormal behaviors out of range based on an abnormal behavior identification model, comprising: inputting the space-time fusion features into a transform encoder, and carrying out feature extraction on the space-time fusion features through iterative computation of a multi-layer encoder to obtain high-layer features, wherein the high-layer features are feature characterization of a personnel behavior mode in video; the transition encoder tail end out-of-range behavior classification head maps the high-level features into two classification probabilities through a full-connection layer, and outputs the probability that the personnel belong to out-of-range behaviors or normal behaviors; the transition encoder end out-of-range behavior regression head processes the high-level features through the full-connection layer and outputs attribute information related to the out-of-range behavior.
8. A person out-of-range abnormal behavior recognition apparatus, comprising: The image block feature vector generation module is used for acquiring a target monitoring video, acquiring T-frame video images based on the target monitoring video, dividing each frame of video image into a plurality of independent image blocks, and converting each independent image block into an image block feature vector; The enhanced position code generation module is used for constructing a space boundary model of the electronic fence, generating a boundary distance map based on each frame of video image and the space boundary model, and coding the generated boundary distance map into a boundary perception position embedded vector consistent with the feature vector dimension of the image block; the fusion image block feature vector generation module is used for obtaining the fusion enhancement type position coded image block feature vector based on the enhancement type position coding and the image block feature vector; the characteristic diagram generation module is used for fusing the characteristic vectors of the image blocks of the enhanced position codes and obtaining the characteristic diagram of the enhanced spatial correlation information by utilizing a spatial attention branch in a Divided Attention structure; the characteristic generation module is used for strengthening the characteristic diagram of the space association information and obtaining the characteristics of the fusion time sequence and the motion information by utilizing a time attention branch in the Divided Attention structure; The space-time fusion feature generation module is used for splicing the feature map of the enhanced space-related information, the fusion time sequence and the features of the motion information to obtain space-time fusion features; the abnormal behavior recognition module is used for recognizing out-of-range abnormal behaviors based on the abnormal behavior recognition model by the time-space fusion characteristics.
9. An electronic device comprising a memory and at least one processor, the memory having instructions stored therein; The at least one processor invokes the instructions in the memory to cause the electronic device to perform the steps of the person out-of-range anomaly behavior recognition method of any one of claims 1-7.
10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the person out of range anomaly identification method of any one of claims 1 to 7.

Description

Method, device, equipment and medium for identifying abnormal behaviors of person crossing boundary Technical Field The invention relates to the technical field of abnormal behavior recognition, in particular to a method, a device, equipment and a medium for recognizing abnormal behaviors of personnel crossing the boundary. Background The current personnel out-of-range abnormal behavior recognition system mainly depends on two technical routes, one is a video analysis method based on a traditional 2D/3D convolutional neural network, such as I3D, C3D, slowFast, and the like, local space-time characteristics can be extracted, but local receptive fields limited by convolution operation are difficult to model long-distance space-time dependency relationship, so that recognition accuracy is low in a slow out-of-range, shielding or complex background scene, and the other is a post-processing method based on target detection, track tracking and rule judgment, such as YOLO+ DeepSORT + electronic fence, and the other is a post-processing method based on target detection, track tracking and rule judgment, wherein simple out-of-range alarm can be realized, but the method is severely dependent on manually set boundary rules and track integrity, and a large number of false alarms or false alarms are easy to occur in actual scenes such as personnel intensive, shielding frequently, severe illumination changes or camera shake. In addition, when the existing Vision Transformer-based video extension model, such as Timesformer, is directly applied to out-of-range detection, explicit modeling of space boundary semantics is lacking, sensitivity to slow and silent out-of-range behaviors is insufficient, and high-precision and low-delay recognition requirements under complex scenes are difficult to meet. Therefore, a scheme for identifying abnormal behaviors of people crossing the boundary, which can achieve the combination of identification accuracy, scene adaptability and real-time performance, is needed. Disclosure of Invention The invention mainly aims to solve the technical problems of low recognition accuracy and high delay in complex scenes in the prior art. The first aspect of the invention provides a method for identifying abnormal behaviors of people crossing a border, which comprises the following steps: Acquiring a target monitoring video, acquiring T-frame video images based on the target monitoring video, dividing each frame of video image into a plurality of independent image blocks, and converting each independent image block into an image block feature vector; The method comprises the steps of constructing a space boundary model of an electronic fence, generating a boundary distance map based on each frame of video image and the space boundary model, encoding the generated boundary distance map into a boundary perception position embedded vector consistent with the feature vector dimension of an image block, and adding the boundary perception position embedded vector and an original space-time position code corresponding to each frame of image block to obtain an enhanced position code of fused boundary information; Obtaining an image block feature vector fused with the enhanced position code based on the enhanced position code and the image block feature vector; The feature vector of the image block with the fusion enhancement type position coding is utilized to obtain a feature map of enhanced space association information by utilizing a space attention branch in a Divided Attention structure; The characteristic diagram of the enhanced spatial association information utilizes a time attention branch in Divided Attention structures to obtain the characteristics of the fusion time sequence and the motion information; splicing the feature map of the enhanced spatial association information, the fusion time sequence and the features of the motion information to obtain space-time fusion features; the spatiotemporal fusion feature identifies out-of-range abnormal behavior based on the abnormal behavior identification model. Optionally, in a first implementation manner of the first aspect of the present invention, the acquiring the target surveillance video, acquiring a T-frame video image based on the target surveillance video, dividing each frame of video image into a plurality of independent image blocks, and converting each independent image block into an image block feature vector, includes: acquiring a target monitoring video, and acquiring a T-frame video image by adopting a dynamic sampling strategy; Uniformly dividing each frame of video image into a plurality of independent image blocks according to a preset size, and ensuring that the pixel range of each image block is not overlapped and completely covers the whole frame of image; and converting each independent image block into a feature vector with a fixed dimension through linear projection operation, so as to realize the mapping of the image space inform