CN-121281099-B - Human body posture recognition method, system, intelligent terminal and storage medium

CN121281099BCN 121281099 BCN121281099 BCN 121281099BCN-121281099-B

Abstract

The application provides a human body posture recognition method, a system, an intelligent terminal and a storage medium, which relate to the technical field of human body posture recognition and comprise the steps of obtaining a human body detection image, controlling a preset example segmentation model to segment the human body detection image so as to generate a human body region binary mask, analyzing the human body region image according to a preset depth estimation algorithm so as to generate image depth information, analyzing the human body detection image, the human body region binary mask and the image depth information so as to generate multi-mode input data, inputting the multi-mode input data into a preset classification model to classify so as to generate a human body posture type, judging whether the human body posture type meets the requirement of a preset abnormal posture type, if not, continuously obtaining the human body detection image so as to carry out cyclic recognition, and if so, prompting according to preset abnormal posture prompt information. The application has the effect of improving the accuracy of human body gesture recognition.

Inventors

WANG ZHONG
WANG SHIWANG

Assignees

上海源控自动化技术有限公司

Dates

Publication Date: 20260512
Application Date: 20251209

Claims (8)

1. A human body posture recognition method, characterized by comprising: acquiring a human body detection image; controlling a preset example segmentation model to segment the human body detection image so as to generate a human body region binarization mask; Analyzing the human body region image according to a preset depth estimation algorithm to generate image depth information; Analyzing the human body detection image, the human body region binarization mask and the image depth information to generate multi-mode input data; Inputting the multi-mode input data into a preset classification model for classification to generate a human body posture type; judging whether the human body posture type meets the requirement of a preset abnormal posture type or not; if the detected images do not accord with the detected images, continuously acquiring the detected images of the human body to carry out cyclic identification; If yes, prompting according to preset abnormal posture prompting information; The step of analyzing the human body detection image, the human body region binarization mask and the image depth information to generate multi-modal input data includes: Feature fusion is carried out on the human body detection image, the human body region binarization mask and the image depth information so as to generate multi-mode basic data; analyzing the multimodal base data to generate successive frame images; processing the continuous frame images according to a preset RAFT algorithm to generate a bidirectional optical flow graph; Performing time sequence feature processing on the multi-mode basic data and the bidirectional optical flow graph to generate time sequence fusion features; Acquiring scene auxiliary characteristics; Weighting and fusing the time sequence fusion characteristic and the scene auxiliary characteristic to generate multi-mode input data; the step of acquiring scene assist features includes: analyzing the human body region binarization mask and the image depth information to generate a ground region depth variance; analyzing the human body detection image to generate an image brightness average value; Analyzing the human body region binarization mask, the image depth information and a preset statistical range to generate the number of depth mutation regions; and mapping the ground area depth variance, the image brightness mean value and the depth abrupt change area quantity into vectors with the same dimension as the time sequence fusion characteristic so as to generate scene auxiliary characteristics.
2. The method of claim 1, wherein the step of feature fusing the human body detection image, the human body region binarization mask, and the image depth information to generate the multi-modal base data comprises: Performing channel stitching on three channels of the human body detection image, one channel of the human body region binarization mask and one channel of the image depth information to generate multi-mode basic data; the expression of the multi-mode basic data is as follows: , Wherein, the The multi-modal base data is represented by a multi-modal base data, Representing a stitching operation in the channel dimension, Represents a human body detection image, Represents a human body region binarization mask, Representing image depth information.
3. The method of claim 1, wherein the step of feature fusing the human body detection image, the human body region binarization mask, and the image depth information to generate the multi-modal base data comprises: Combining the human body region binarization mask and the image depth information to generate a condition feature map; Analyzing the human body detection image, the condition feature map, a preset query weight matrix and a preset key weight matrix to generate attention weights; And analyzing the human body detection image, the attention weight and a preset value weight matrix to generate multi-mode basic data.
4. The method of claim 1, wherein the classification model includes a backbone network and a crush stimulus module, and the step of inputting the multimodal input data into a predetermined classification model for classification to generate the human body posture type includes: inputting the multi-mode input data into a backbone network for processing to generate an intermediate feature map; inputting the intermediate feature map into the extrusion excitation module for processing to generate a recalibration feature map; the recalibration feature map is input into a backbone network for classification to generate human body posture types.
5. The method of claim 4, wherein the step of inputting the intermediate profile into the crush stimulus module for processing to generate the recalibration profile comprises: Global average pooling is carried out on the intermediate feature map so as to generate channel global features; analyzing the global characteristics of the channel to generate channel weights; The channel weights and intermediate feature maps are analyzed to generate a recalibration feature map.
6. A human body posture recognition system, comprising: the acquisition module is used for acquiring a human body detection image; A memory for storing a program of a human body posture identifying method according to any one of claims 1 to 5; A processor, a program in memory being capable of being loaded by the processor and implementing a human gesture recognition method according to any one of claims 1 to 5.
7. An intelligent terminal comprising a memory and a processor, wherein the memory stores a computer program that can be loaded by the processor and that performs a human body posture recognition method according to any one of claims 1 to 5.
8. A computer-readable storage medium, characterized in that a computer program capable of being loaded by a processor and executing a human body posture recognition method according to any one of claims 1 to 5 is stored.

Description

Human body posture recognition method, system, intelligent terminal and storage medium Technical Field The application relates to the technical field of human body posture recognition, in particular to a human body posture recognition method, a human body posture recognition system, an intelligent terminal and a storage medium. Background Human body posture recognition mainly refers to detecting and locating key nodes (such as joints, bones and the like) of a human body from images or videos, and estimating the posture and the action of the human body. In the related art, a joint point detection method (such as Open Pose) is generally adopted for human body gesture recognition, a convolutional neural network is adopted for Open Pose to detect key points (such as joints) of a human body at the same time and perform limb association, so that the key points are distributed to different individuals to obtain a skeleton diagram of a person, classification features are extracted from the skeleton diagram, and the human body gesture is obtained by reasoning according to the classification features by using a classification model. Aiming at the related technology, the joint point detection method relies on human body joint point positioning, when the human body joint point is blocked, the accuracy of the joint point detection method can be greatly reduced, so that the human body gesture recognition is inaccurate, and the improvement is still in existence. Disclosure of Invention In order to improve the accuracy of human body gesture recognition, the application provides a human body gesture recognition method, a system, an intelligent terminal and a storage medium. In a first aspect, the present application provides a method for recognizing a human body posture, which adopts the following technical scheme: a human body posture recognition method comprising: acquiring a human body detection image; controlling a preset example segmentation model to segment the human body detection image so as to generate a human body region binarization mask; Analyzing the human body region image according to a preset depth estimation algorithm to generate image depth information; Analyzing the human body detection image, the human body region binarization mask and the image depth information to generate multi-mode input data; Inputting the multi-mode input data into a preset classification model for classification to generate a human body posture type; judging whether the human body posture type meets the requirement of a preset abnormal posture type or not; if the detected images do not accord with the detected images, continuously acquiring the detected images of the human body to carry out cyclic identification; If yes, prompting according to preset abnormal posture prompting information. By adopting the technical scheme, after the human body detection image, the human body region binarization mask and the image depth information are analyzed, the multi-mode input data are obtained, so that the classification model can identify the human body gesture type according to the multi-mode input data, on one hand, the background interference is reduced, on the other hand, the classification model is focused on the person to be identified, the dependence on the human body joint point is eliminated, the influence caused by shielding is greatly reduced, and the accuracy of human body gesture identification is improved. Optionally, the step of analyzing the human detection image, the human region binarization mask, and the image depth information to generate the multi-modal input data includes: Feature fusion is carried out on the human body detection image, the human body region binarization mask and the image depth information so as to generate multi-mode basic data; analyzing the multimodal base data to generate successive frame images; processing the continuous frame images according to a preset RAFT algorithm to generate a bidirectional optical flow graph; Performing time sequence feature processing on the multi-mode basic data and the bidirectional optical flow graph to generate time sequence fusion features; Acquiring scene auxiliary characteristics; And carrying out weighted fusion on the time sequence fusion characteristic and the scene auxiliary characteristic to generate multi-mode input data. By adopting the technical scheme, the multi-mode basic data and the bidirectional optical flow diagram are subjected to time sequence feature processing to obtain time sequence fusion features, so that double judgment bases of static features and dynamic trends are constructed, the recognition accuracy of similar gesture types is improved, the scene auxiliary features and the time sequence fusion features are subjected to weighted fusion, so that the judgment threshold value of the corresponding gesture type in the scene is adjusted according to the scene, and the accuracy of human gesture recognition is improved. Optionally, the step of pe