CN-122024316-A - Platform jump abnormal behavior identification method, device, equipment and medium
Abstract
The invention provides a method, a device, equipment and a medium for identifying jump abnormal behaviors of a platform, which comprise the steps of obtaining continuous video frames, screening candidate fragments containing vertical displacement actions based on the obtained continuous video frames, carrying out self-adaptive frame sampling on the candidate fragments to generate an input tensor, carrying out feature extraction on the input tensor by utilizing an optimized video Swin converter to obtain a multi-scale space-time feature map, carrying out feature fusion on the obtained multi-scale space-time feature map to obtain a comprehensive space-time feature map, predicting drop point coordinates of the jump behaviors of the comprehensive space-time feature map through a drop point rationality network, matching the drop point coordinates with a geometric boundary mask map, and judging whether a current drop point is located in a preset dangerous area so as to identify the jump abnormal behaviors.
Inventors
- XU MENGJIA
- LI SI
- LI PEIJI
Assignees
- 上海东普信息科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260128
Claims (10)
- 1. A method for identifying platform jump abnormal behavior, comprising: Acquiring continuous video frames, screening candidate fragments containing vertical displacement action based on the acquired continuous video frames, and performing self-adaptive frame sampling on the candidate fragments to generate an input tensor; Performing feature extraction on the input tensor by using an optimized video Swin converter to obtain a multi-scale space-time feature map; The method comprises the steps of carrying out feature fusion on an obtained multi-scale space-time feature map to obtain a comprehensive space-time feature map, predicting falling point coordinates of jumping behaviors through a falling point rationality network by the comprehensive space-time feature map, matching the falling point coordinates with a geometric boundary mask map, and judging whether a current falling point is located in a preset dangerous area or not, so that jumping abnormal behaviors are identified.
- 2. The method for identifying abnormal platform jump behavior according to claim 1, wherein the steps of acquiring continuous video frames, filtering candidate segments including vertical displacement motion based on the acquired continuous video frames, and adaptively sampling the candidate segments to generate an input tensor comprise: acquiring a continuous video stream through a monitoring camera; converting each frame of video image in the obtained continuous video stream into a standard digital image format meeting preset requirements, wherein each frame of converted image contains a complete platform scene area; The method comprises the steps of obtaining an inter-frame difference variance value by adopting an inter-frame difference variance method aiming at a converted video frame, comparing the inter-frame difference variance value with a set threshold, judging that a vertical displacement action is contained between two continuous frames when the inter-frame difference variance value exceeds the set threshold, repeatedly triggering and executing to obtain a frame sequence containing the vertical displacement action, marking the frame sequence containing the vertical displacement action as a candidate segment, and extracting a T frame image from the candidate segment in an adaptive sampling mode; And adjusting the size of each frame of image in the extracted T frame of image to H multiplied by W, retaining RGB three-channel information of the frame of image, and finally combining to generate an input tensor with dimensions of T multiplied by H multiplied by W multiplied by 3.
- 3. The method for identifying the platform jump abnormal behavior according to claim 2, wherein the step of obtaining the inter-frame difference variance value by using the inter-frame difference variance method for the converted video frames comprises the steps of selecting two adjacent frames of video images for the converted video frames and converting the two adjacent frames of video images into gray images, calculating the difference value of the gray values of pixel points corresponding to the two frames of gray images, squaring all the difference values, and summing the difference values to obtain the inter-frame difference variance value.
- 4. The method for identifying abnormal behavior of platform jump according to claim 1, wherein said feature extraction of input tensors by using optimized video Swin transformer to obtain multi-scale spatio-temporal feature map comprises: in the sliding process, the window moves in the time dimension and the space dimension according to a preset step length, and the whole input tensor is divided into a plurality of non-overlapping or partially overlapping 3D window characteristic blocks; based on the jump track template, calculating the similarity between each space-time position in the block and the corresponding position on the track template, and generating a space-time attention offset value according to the similarity; calculating original self-attention weights for all feature vectors in each 3D window feature block, and superposing attention offset values on the original self-attention weights to obtain enhanced self-attention weights; Carrying out weighted summation on the feature vectors in the 3D window feature block based on the enhanced self-attention weight to obtain local space-time features of the focus jump-vacation-falling point; The obtained local space-time characteristics of the focus jump-vacation-falling point are processed through layer normalization and full connection layers, so that the enhanced local space-time characteristics are obtained; And the enhanced local space-time characteristics are sequentially sampled by adopting an asymmetric space-time downsampling strategy and a symmetric downsampling strategy to obtain a multi-scale space-time characteristic diagram, wherein the multi-scale space-time characteristic diagram comprises a high-level characteristic diagram and a low-level characteristic diagram.
- 5. The method for identifying the jump abnormal behavior of the platform according to claim 4, wherein the generating the jump track template of the space-time dimension based on the physical parabolic model comprises the steps that the physical parabolic model is combined with a jump track template of the space-time dimension, wherein the jump track template comprises the gravity acceleration and the complete space coordinate change curve from the jump point to the empty highest point to the drop point is calculated according to the preset jump initial speed, and is mapped into a pixel coordinate system of an image.
- 6. The method for identifying abnormal platform jump behavior according to claim 4, wherein said feature fusion of the obtained multi-scale spatio-temporal feature map to obtain a comprehensive spatio-temporal feature map comprises: And carrying out element level addition or splicing on the feature images of the corresponding level, and integrating through a convolution layer to obtain the integrated space-time feature image of the fused multi-scale information.
- 7. The method for identifying the jump abnormal behavior of the platform according to claim 1, wherein the comprehensive space-time feature map predicts the landing point coordinates of the jump behavior through a landing point rationality network, matches the landing point coordinates with a geometric boundary mask map, and judges whether the current landing point is located in a preset dangerous area, thereby identifying the jump abnormal behavior, comprising: acquiring a CAD model or pre-calibrated geometric boundary data of a platform, and constructing a geometric boundary mask map based on the acquired CAD model or pre-calibrated geometric boundary data of the platform; Constructing a drop point rationality network based on convolution and a cyclic neural network; inputting the comprehensive space-time feature map into a drop point rationality network to predict drop point coordinates of jumping behaviors; The comprehensive space-time feature map is subjected to global averaging pooling to obtain a one-dimensional visual feature vector, meanwhile, the rationality judgment result of the drop point is converted into a numerical feature, the numerical feature is spliced and then input into a full-connection layer, the full-connection layer calculates confidence score of jumping behaviors according to the visual feature vector, and the corresponding danger level is generated by combining whether the drop point is in a danger area and the risk degree of the danger area.
- 8. A platform jump abnormal behavior recognition apparatus, comprising: The input tensor generation module is used for acquiring continuous video frames, screening candidate fragments containing vertical displacement actions based on the acquired continuous video frames, and carrying out self-adaptive frame sampling on the candidate fragments to generate an input tensor; the multi-scale space-time characteristic diagram acquisition module is used for carrying out characteristic extraction on input tensors by utilizing the optimized video Swin converter to obtain a multi-scale space-time characteristic diagram; the jump abnormal behavior recognition module is used for carrying out feature fusion on the obtained multi-scale space-time feature map to obtain a comprehensive space-time feature map, predicting the falling point coordinates of the jump behavior through the falling point rationality network by the comprehensive space-time feature map, matching the falling point coordinates with the geometric boundary mask map, and judging whether the current falling point is located in a preset dangerous area or not so as to recognize the jump abnormal behavior.
- 9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method for identifying a platform jump abnormal behavior according to any of claims 1 to 7.
- 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when executed by the processor implements the steps of the method for identifying a platform jump abnormal behavior according to any of claims 1 to 7.
Description
Platform jump abnormal behavior identification method, device, equipment and medium Technical Field The invention relates to the technical field of anomaly identification, in particular to a method, a device, equipment and a medium for identifying platform jump anomaly behavior. Background In the current industrial scene, automatic identification of high-risk abnormal behaviors of personnel platform jumping is a key requirement for safe production management. While the conventional method based on rule or simple motion detection is difficult to distinguish jump from normal actions, the deep learning method is advanced, but the problems of complex calculation, weak generalization, lack of behavioral physics semantics and the like generally exist, in recent years, vision Transformer shows strong performance in image recognition, video Swin Transformer further expands the method to the video field, and high-efficiency space-time modeling is realized through a 3D sliding window mechanism, however, the general architecture is not optimized for specific high-risk behaviors, so that feature extraction in a key space-time region is insufficient, and the requirement of industrial-level high-precision low-delay abnormal recognition is difficult to meet. Disclosure of Invention The invention mainly aims to solve the technical problems that the characteristic extraction of the jumping behavior in the key space-time area is insufficient and the requirement of industrial-level high-precision low-delay abnormality identification is difficult to meet in the prior art. The first aspect of the present invention provides a method for identifying a platform jump abnormal behavior, including: Acquiring continuous video frames, screening candidate fragments containing vertical displacement action based on the acquired continuous video frames, and performing self-adaptive frame sampling on the candidate fragments to generate an input tensor; Performing feature extraction on the input tensor by using an optimized video Swin converter to obtain a multi-scale space-time feature map; The method comprises the steps of carrying out feature fusion on an obtained multi-scale space-time feature map to obtain a comprehensive space-time feature map, predicting falling point coordinates of jumping behaviors through a falling point rationality network by the comprehensive space-time feature map, matching the falling point coordinates with a geometric boundary mask map, and judging whether a current falling point is located in a preset dangerous area or not, so that jumping abnormal behaviors are identified. Optionally, in a first implementation manner of the first aspect of the present invention, the acquiring continuous video frames, filtering candidate segments including a vertical displacement action based on the acquired continuous video frames, and performing adaptive frame sampling on the candidate segments to generate an input tensor includes: acquiring a continuous video stream through a monitoring camera; converting each frame of video image in the obtained continuous video stream into a standard digital image format meeting preset requirements, wherein each frame of converted image contains a complete platform scene area; The method comprises the steps of obtaining an inter-frame difference variance value by adopting an inter-frame difference variance method aiming at a converted video frame, comparing the inter-frame difference variance value with a set threshold, judging that a vertical displacement action is contained between two continuous frames when the inter-frame difference variance value exceeds the set threshold, repeatedly triggering and executing to obtain a frame sequence containing the vertical displacement action, marking the frame sequence containing the vertical displacement action as a candidate segment, and extracting a T frame image from the candidate segment in an adaptive sampling mode; And adjusting the size of each frame of image in the extracted T frame of image to H multiplied by W, retaining RGB three-channel information of the frame of image, and finally combining to generate an input tensor with dimensions of T multiplied by H multiplied by W multiplied by 3. Optionally, in a second implementation manner of the first aspect of the present invention, the obtaining an inter-frame difference variance value by using an inter-frame difference variance method for the converted video frame includes selecting two adjacent frames of video images for the converted video frame and converting the two adjacent frames of video images into gray images, calculating a difference value of gray values of pixel points corresponding to the two frames of gray images, performing square operation on all the difference values, and then summing the difference values to obtain the inter-frame difference variance value. Optionally, in a third implementation manner of the first aspect of the present invention, the feature extraction