CN-121617044-B - Method and system for extracting behavior characteristics of surveillance video based on space-time diagram convolutional network

CN121617044BCN 121617044 BCN121617044 BCN 121617044BCN-121617044-B

Abstract

The invention belongs to the technical field of computer vision and image data processing, and relates to a method and a system for extracting behavior characteristics of a surveillance video based on a space-time diagram convolution network. The method comprises the steps of firstly extracting key points of human bones from a monitoring video and carrying out scale normalization based on the length of a trunk, secondly calculating a vertical uncontrolled deviation value reflecting the degree of gravity uncontrolled in the vertical direction of the human body and track oscillation entropy reflecting the disordered degree of a motion track, secondly constructing a self-adaptive weight function by using the vertical uncontrolled deviation value and the track oscillation entropy to generate a behavior anomaly weight coefficient, and finally weighting the bone characteristics to be identified by using the behavior anomaly weight coefficient and identifying the bone characteristics to be identified by inputting a space-time diagram convolution network. The invention can effectively distinguish the behaviors with similar space geometric forms and different physical essence, and reduces the false alarm rate in industrial scenes.

Inventors

AN WEILIANG
ZHOU XIAOBO
ZENG QINGKANG
TIAN ZHIYU
CHEN HAO
MAO SHENG
ZHONG JUNLING

Assignees

安宇和中科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260202

Claims (10)

1. The method for extracting the behavior characteristics of the monitoring video based on the space-time diagram convolution network is characterized by comprising the following steps of: extracting human skeleton key points from a monitoring video stream, constructing a normalized coordinate system based on a human trunk length reference, and converting original pixel coordinates of the human skeleton key points into normalized coordinates; Calculating the difference between the acceleration of each joint of the human body in the vertical direction and a preset standard gravity reference value according to the normalized coordinates, and determining a vertical out-of-control deviation value reflecting the degree of gravity out-of-control by combining the detection confidence coefficient; Calculating the speed vector of each joint of the human body according to the normalized coordinates, and determining the track oscillation entropy reflecting the motion confusion degree based on the direction change and the speed modular length of the speed vector at adjacent moments; And constructing a self-adaptive weight function by using the track oscillation entropy and the vertical uncontrolled deviation value, calculating to obtain a behavior abnormality weight coefficient, and fusing the behavior abnormality weight coefficient with a bone characteristic matrix to be identified to serve as input of a space-time diagram convolution network for behavior identification.
2. The method for extracting the behavior characteristics of the surveillance video based on the space-time diagram convolutional network according to claim 1, wherein the length reference of the human trunk is determined by calculating the euclidean distance between the neck joint and the pelvis center point in the same frame.
3. The method for extracting the behavior characteristics of the surveillance video based on the space-time diagram convolutional network according to claim 2, wherein the converting the original pixel coordinates of the human skeleton key points into normalized coordinates specifically comprises: Taking the pelvis central point of each frame as a coordinate origin, correspondingly subtracting the original abscissa and the original ordinate of the pelvis central point from the original abscissa and the original ordinate of the human skeleton key point respectively, and dividing the subtracted result by the length standard of the human trunk to obtain the normalized abscissa and the normalized ordinate of the human skeleton key point.
4. The method for extracting the behavior characteristics of the surveillance video based on the space-time diagram convolutional network according to claim 1, wherein the calculation formula of the vertical out-of-control deviation value is as follows: in the formula, Represent the first Frame No The vertical run-away offset value of each joint, Represent the first Frame No Normalized vertical acceleration of the individual joints, Representing a preset normalized standard gravity reference value, Represent the first Frame No The detection confidence of each joint is that, Is a first minimum constant value, which is a first minimum constant, Is a natural constant.
5. The method for extracting the behavior characteristics of the surveillance video based on the space-time diagram convolutional network according to claim 4, wherein the normalized vertical acceleration is obtained by a second-order differential operation, and specifically comprises: And performing second-order differential operation by using normalized ordinate of a previous frame of the current frame, a current frame and a next frame of the current frame, and dividing an operation result by the square of a normalized time step to obtain the normalized vertical acceleration.
6. The method for extracting the behavior characteristics of the surveillance video based on the space-time diagram convolutional network according to claim 4, wherein the calculation formula of the trace oscillation entropy is as follows: in the formula, Represent the first Individual joints are in time window The entropy of the trajectory oscillations within the interior, And Respectively represent the first Velocity components of the time velocity vector in the horizontal and vertical axis directions, Represent the first The time-of-day velocity vector is modulo-long squared, Is the second smallest constant.
7. The method for extracting the behavior characteristics of the surveillance video based on the space-time diagram convolutional network according to claim 6, wherein the velocity components of the velocity vector in the directions of the horizontal axis and the vertical axis are obtained by a first-order difference operation, specifically comprising: And subtracting the normalized ordinate of the previous frame from the normalized abscissa of the current frame to obtain a velocity component in the direction of the horizontal axis.
8. The method for extracting behavior characteristics of surveillance video based on a space-time diagram convolutional network according to claim 6, wherein the calculation formula of the behavior anomaly weighting coefficient is as follows: in the formula, Represent the first Frame No The behavioural abnormal weighting coefficients of the individual joints, Is a regulatory factor.
9. The method for extracting behavior characteristics of surveillance video based on a space-time diagram convolutional network according to claim 8, wherein the fusing the behavior anomaly weighting coefficients with the bone feature matrix to be identified comprises: and multiplying the behavioral abnormality weighting coefficient with the bone characteristic matrix to be identified element by element to generate a weighted bone characteristic matrix.
10. The system for extracting the behavior characteristics of the surveillance video based on the space-time diagram convolutional network is characterized by comprising a processor and a memory, wherein the memory stores a computer program, and when the computer program is run by the processor, the method for extracting the behavior characteristics of the surveillance video based on the space-time diagram convolutional network is realized according to any one of claims 1 to 9.

Description

Method and system for extracting behavior characteristics of surveillance video based on space-time diagram convolutional network Technical Field The invention belongs to the technical field of computer vision and image data processing, and particularly relates to a method and a system for extracting behavior characteristics of a surveillance video based on a space-time diagram convolution network. Background With the rapid development of artificial intelligence technology, intelligent video monitoring systems have been widely used in the field of industrial safety production and public safety precaution. Among the numerous motion recognition technologies, the analysis method based on human skeleton key points gradually becomes a main technical route in the field due to the strong robustness of illumination change and background complexity. At present, motion recognition based on skeleton key points mainly depends on a space-time diagram convolution network, the core flow of the technology is to extract human joint coordinates through a gesture estimation algorithm, construct a space-time diagram, and aggregate features in space and time dimensions by using convolution kernels so as to realize classification and recognition of human motions. Although the prior art is still functional in a general scenario, it still faces serious challenges in complex industrial safety monitoring applications. The main difficulty is that the high-risk abnormal behaviors and the daily normal behaviors often have extremely high similarity in space geometry. For example, a worker actively controlled squat or bow-down motion during a job is nearly identical to a sudden syncope out-of-control drop behavior in the final resting posture after the motion occurs. This high degree of visual confusion makes it difficult for a method that relies solely on spatial skeleton geometry to distinguish between the two effectively. The root cause of the difficulty in recognition is that the existing space-time diagram convolution network model is relatively average in weight distribution of feature extraction on time sequence, and mainly focuses on geometric position change of the node, but ignores the underlying physical mechanics behind the action. From the kinematic perspective, falling is a gravity-dominated uncontrolled acceleration process accompanied by severe changes in vertical velocity and disorder of motion trajectories, while squatting is a controlled deceleration process of muscles against gravity, with obvious motion logic and regularity. The lack of explicit modeling and analysis of such stress states and motion control logic in the prior art results in a system that is extremely prone to false positives when dealing with such behavior that is similar in appearance but distinct in physical nature, severely impacting the practicality and reliability of an industrial monitoring system. Disclosure of Invention The invention aims to provide a method and a system for extracting behavior characteristics of a surveillance video based on a space-time diagram convolutional network, which are used for solving the technical problem that the behavior recognition technology based on the space-time diagram convolutional network in the prior art can not effectively distinguish behaviors with similar space geometric forms but different physical essence because of neglecting physical kinematics essence. In order to solve the problems, the technical scheme of the monitoring video behavior feature extraction method based on the space-time diagram convolution network provided by the invention is as follows: the method for extracting the behavior characteristics of the surveillance video based on the space-time diagram convolutional network comprises the following steps: extracting human skeleton key points from a monitoring video stream, constructing a normalized coordinate system based on a human trunk length reference, and converting original pixel coordinates of the human skeleton key points into normalized coordinates; Calculating the difference between the acceleration of each joint of the human body in the vertical direction and a preset standard gravity reference value according to the normalized coordinates, and determining a vertical out-of-control deviation value reflecting the degree of gravity out-of-control by combining the detection confidence coefficient; Calculating the speed vector of each joint of the human body according to the normalized coordinates, and determining the track oscillation entropy reflecting the motion confusion degree based on the direction change and the speed modular length of the speed vector at adjacent moments; And constructing a self-adaptive weight function by using the track oscillation entropy and the vertical uncontrolled deviation value, calculating to obtain a behavior abnormality weight coefficient, and fusing the behavior abnormality weight coefficient with a bone characteristic matrix to be identifi