CN-122024138-A - Two-stage fall detection method based on attitude estimation and space-time diagram convolution

CN122024138ACN 122024138 ACN122024138 ACN 122024138ACN-122024138-A

Abstract

A two-stage fall detection method based on attitude estimation and space-time diagram convolution belongs to the technical field of computer vision and artificial intelligence. Aiming at the problems that the similar falling behaviors are difficult to distinguish, the generalization is insufficient due to limited falling samples, the judgment is affected, and the like, a gesture estimation network which introduces a deformable convolution, a attention mechanism and a Slim-Neck structure is adopted to output key point coordinates and confidence, the confidence of the corrected key point is obtained based on task guide evaluation, a dynamic mask is generated to screen invalid key points, a skeleton space-time diagram is constructed and normalized, the skeleton space-time diagram is input into a space-time diagram convolution network to extract characteristics, and a falling detection judgment result is output through average pooling and full-connection layer classification. Is suitable for nursing the old video intelligent monitoring and other fields.

Inventors

WANG PENG
ZHU YING
JI ZHOUWEI
CHU ZHICHAO
WANG CHUANG
ZHU WEIHAO
ZHANG QI
SHEN FANGZHEN
Mu Lingsen

Assignees

常州工学院

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (10)

1. The two-stage fall detection method based on attitude estimation and space-time diagram convolution is characterized by comprising the following steps of: S1, acquiring a video sequence to be detected; s2, carrying out human body posture estimation on the video sequence by adopting a posture estimation network, and outputting a human body key point set, wherein the human body key point set comprises coordinates of key points and corresponding key point confidence degrees, and a deformable convolution, an attention mechanism and a Slim-Neck structure are introduced into the posture estimation network; S3, performing task-oriented evaluation on the quality of the key points based on the fall detection task to obtain a corrected key point confidence coefficient; S4, generating a dynamic mask based on the corrected key point confidence coefficient, screening the human body key point set according to the dynamic mask, judging the key points with the corrected key point confidence coefficient lower than a preset threshold value as invalid key points, and screening in subsequent processing to obtain an effective key point set; S5, constructing a skeleton space-time diagram for fall detection based on the effective key point set; s6, carrying out normalization processing on the skeleton space-time diagram; And S7, inputting the normalized skeleton space-time diagram into a space-time diagram convolution network for feature extraction, carrying out average pooling on the output of the space-time diagram convolution network to obtain a sequence feature vector of the video sequence, inputting the sequence feature vector into a full-connection layer for classification, and outputting a fall detection discrimination result of the video sequence.
2. The method of claim 1, wherein S2 the pose estimation network improves the C3k2 module in at least one preset feature extraction block of the backbone network, introduces DCNv4 deformable convolutions to form a C3k2_ DCNv4 module, replaces convolutions in the Bottleneck structure within the C3k2 module with DCNv4 convolutions having a convolution kernel size of 3 x 3, constituting a Bottleneck _ DCNv4 structure.
3. The method of claim 1, wherein S2 the pose estimation network sets DAttention a attention mechanism at least one preset horizon of a backbone network.
4. The method of claim 1, wherein S2 the neck network of the pose estimation network introduces a Slim-Neck structure and replaces a normal convolution in at least one convolution structure in the neck network with a lightweight convolution GSConv.
5. The method according to claim 1, wherein S3 comprises: Determining a target key point set for fall detection and discrimination; Setting a first evaluation parameter and a second evaluation parameter for the target key point set and other key points respectively, wherein the first evaluation parameter is smaller than the second evaluation parameter; And calculating the key point quality scores of the key points based on the first evaluation parameters and the second evaluation parameters, and correcting the key point confidence coefficient according to the key point quality scores to obtain the corrected key point confidence coefficient.
6. The method of claim 1, wherein S4 the preset threshold is 0.5, and when the corrected keypoint confidence is below 0.5, the corresponding keypoint is determined to be an invalid keypoint and is masked in subsequent processing by the dynamic mask.
7. The method according to claim 1, wherein S5 comprises: Constructing a node set based on the valid key point set; Constructing a space edge set representing the connection relation of key points in the same frame; constructing a time edge set representing the association relation of key points between adjacent frames; and forming the skeleton space-time diagram based on the node set, the space edge set and the time edge set.
8. The method of claim 1, wherein S7 the space-time diagram convolution network is an ST-GCN network, and the ST-GCN network performs space-time feature extraction on an input skeleton space-time diagram, obtains the sequence feature vector through averaging pooling, and outputs the fall detection discrimination result through a full connection layer, and the ST-GCN network reduces network complexity and improves reasoning efficiency by deleting or skipping at least one space-time diagram convolution unit.
9. Computer comprising a processor and a storage medium, characterized in that the computer performs the method according to any of claims 1-8 when the processor reads a computer program stored in the storage medium.
10. Computer program product, as a computer program, characterized in that the method according to any of claims 1-8 is implemented when the computer program is read.

Description

Two-stage fall detection method based on attitude estimation and space-time diagram convolution Technical Field The invention belongs to the technical field of measurement and testing, in particular relates to a safety monitoring and falling event detection technology based on video information, and particularly relates to a two-stage falling detection method based on human body posture estimation and space-time diagram convolution. Background Current global population aging trends are accelerated, and elderly fall events have become a high frequency risk source in the public health field. The falling may not only cause direct injury such as fracture, but also cause loss of mobility and various complications due to untimely treatment, further increasing the household and social medical burden. Therefore, the method and the device detect and alarm the old people falling in time in the daily life scene, reduce the secondary injury caused by falling, and have important practical significance. Around the need for fall event detection, the prior art forms three main routes for wearable sensor detection, environmental sensor detection, and detection based on video monitoring. Although the wearing type sensor scheme can effectively detect the falling event, engineering constraints such as wearing habit, endurance and the like are always required to be considered, long-term use is easy to influence, the environment type sensor scheme is easy to be interfered by environment noise in actual deployment, and stability is limited. In comparison, the tumble detection based on video monitoring is an important direction for research and application due to the fact that the tumble detection is non-contact, large in coverage area and easy to combine with the existing monitoring equipment. In a video monitoring route, the common practice is to conduct behavior classification on video frames or fragments to judge whether falling occurs, but the method only pays attention to the human body posture characteristics of the current frame, so that falling and squatting, sitting, lying and other 'falling-like' behaviors are difficult to distinguish effectively, and the false detection risk is high. On the other hand, the number of falling samples in the falling detection data set is usually limited, so that the model is difficult to learn more diversified falling characteristics, the generalization capability is insufficient, and the detection performance fluctuation is obvious under different crowds and different scenes. In addition, more and more methods introduce a gesture estimation model to detect key points of a human body, and then perform the falling judgment based on a key point sequence, but the insufficient positioning precision of the key points can directly influence the subsequent falling judgment, so that the detection accuracy is reduced, and meanwhile, the gesture estimation model training process lacks effective connection with a falling detection task, so that the key point detection is more accurate and not necessarily reliable. Under the background, although the complex model may improve the positioning accuracy of the key points, it is difficult to evaluate the actual benefits of the complexity on the tumble detection task, and structural redundancy is easily introduced and the reasoning efficiency is affected, so that a tumble event detection technical scheme capable of balancing the quality of the key points, the discrimination robustness and the detection efficiency is needed. Disclosure of Invention In order to solve the problems that in the prior art, the falling detection based on video monitoring is difficult to distinguish 'similar falling' behaviors, the falling sample is limited to cause insufficient generalization, the key point positioning instability influences the detection discrimination accuracy, the relation between the gesture estimation and the falling detection task is weak, the model complexity benefit is difficult to evaluate and the redundancy is easy, the invention provides the following scheme: A two-stage fall detection method based on attitude estimation and space-time diagram convolution comprises the following steps: S1, acquiring a video sequence to be detected; s2, carrying out human body posture estimation on the video sequence by adopting a posture estimation network, and outputting a human body key point set, wherein the human body key point set comprises coordinates of key points and corresponding key point confidence degrees, and a deformable convolution, an attention mechanism and a Slim-Neck structure are introduced into the posture estimation network; S3, performing task-oriented evaluation on the quality of the key points based on the fall detection task to obtain a corrected key point confidence coefficient; S4, generating a dynamic mask based on the corrected key point confidence coefficient, screening the human body key point set according to the dynamic mask, judging t