CN-122024336-A - Behavior action recognition method based on image analysis

CN122024336ACN 122024336 ACN122024336 ACN 122024336ACN-122024336-A

Abstract

The invention relates to the field of image analysis and video processing, and discloses a behavior action recognition method based on image analysis; the method comprises the steps of obtaining a video stream to be identified, detecting a human body target in the video frame and correlating the human body target across frames to form a target human body track, extracting human body key points, determining the reliability of the key points based on detection confidence, shielding proportion, displacement continuity deviation, local image definition index and skeleton geometric constraint deviation, generating a position reliability vector, correcting low-reliability key points to obtain a corrected key point sequence, dividing action segments based on the corrected key point sequence, centroid movement track, position movement energy change, distance change between a human body and an adjacent object and scene area switching information, generating a scene constraint action candidate set by combining scene area type, adjacent object type, human body posture state and human body displacement mode, and executing action identification.

Inventors

XIE XIAOYING
ZHAO JUN

Assignees

成都知云仓科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260416

Claims (10)

1. The behavior action recognition method based on image analysis is characterized by comprising the following steps of: The method comprises the steps of S1, obtaining a video stream to be identified, carrying out human body target detection and cross-frame correlation on video frames in the video stream to be identified to obtain human body tracks of target human bodies in continuous video frames, determining scene area types where the target human bodies are located based on video frame images corresponding to the human body tracks, and determining adjacent object types meeting preset space distance conditions with the target human bodies; S2, extracting key points of a human body from a video frame corresponding to a human body track, respectively acquiring detection confidence level, shielding proportion, adjacent frame displacement continuity deviation, corresponding local image definition index and skeleton geometric constraint deviation aiming at each key point, and determining the reliability of the key points based on the detection confidence level, the shielding proportion, the adjacent frame displacement continuity deviation, the corresponding local image definition index and the skeleton geometric constraint deviation; S3, determining a first candidate position of the key point to be corrected based on the track change trend of the key point to be corrected in the history frame and the skeleton geometric constraint relation between the key point and the adjacent key point, determining a local image area based on the first candidate position, extracting the local secondary key point from the local image area, determining a second candidate position of the key point to be corrected, carrying out weighted fusion on the first candidate position and the second candidate position according to the displacement continuity deviation of the adjacent frame of the key point to be corrected and the local image definition index of the local image area, determining the corrected target key point position, obtaining a corrected key point sequence, and updating the reliability vector of the part; S4, calculating action boundary scores and boundary uncertainty respectively based on the correction key point sequence, the movement track of the mass center of the target human body, the movement energy change of each human body part, the distance change between the target human body and the adjacent object and scene area switching information; S5, aiming at each candidate action segment, according to the scene area type, the adjacent object type, the human body posture state and the human body displacement mode corresponding to the candidate action segment, generating a scene constraint action candidate set corresponding to the candidate action segment based on a pre-established scene constraint mapping rule; S6, determining the corresponding candidate action segment as a segment to be checked when the first identification confidence coefficient falls into a preset gray area, or the position reliability vector has position reliability lower than a second threshold value, or the boundary uncertainty is larger than a third threshold value, and executing high-density frame extraction and local feature re-extraction only for a local image area corresponding to a human body part with the position reliability lower than the second threshold value in the segment to be checked and an adjacent video frame taking a video frame with the first identification confidence coefficient as the center in the segment to be checked, and executing second action recognition in a corresponding scene constraint action candidate set to obtain a second identification result and the second identification confidence coefficient; s7, carrying out consistency check on the first identification result and the second identification result, outputting the final behavior action category, the corresponding time interval and the final identification confidence coefficient of the target human body according to the second identification result when the consistency check is passed, and determining the final behavior action category, the corresponding time interval and the final identification confidence coefficient of the target human body based on the first identification confidence coefficient and the second identification confidence coefficient when the consistency check is not passed.
2. The behavior recognition method based on image analysis according to claim 1, wherein the determining the scene area type where the target human body is located and determining the type of the neighboring object satisfying the preset spatial distance condition with the target human body includes: Performing region division on scene regions in a video frame to obtain a plurality of preset scene regions; Acquiring a human body region of a target human body in a current video frame, and calculating the overlapping degree of the human body region and each preset scene region; determining the region type corresponding to the preset scene region with the largest overlap degree of the human body region as the scene region type of the target human body; detecting objects in the current video frame, and obtaining object types and object positions of the objects; and determining the type of the adjacent object meeting the preset space distance condition with the target human body according to the space distance between each object and the target human body and the number of frames continuously meeting the preset space distance condition.
3. The behavior recognition method based on image analysis according to claim 2, wherein determining the key point reliability of each key point based on the detection confidence, the occlusion ratio, the adjacent frame displacement continuity deviation, the corresponding local image definition index, and the skeleton geometry constraint deviation, and generating the position reliability vectors corresponding to the head, the trunk, the upper limb, and the lower limb, comprises: Normalizing the detection confidence coefficient, the shielding proportion, the displacement continuity deviation of adjacent frames, the definition index of the corresponding partial image and the geometric constraint deviation of the skeleton of each key point; Multiplying the normalized detection confidence and the local image definition index by corresponding forward weights respectively; multiplying the normalized shielding proportion, adjacent frame displacement continuity deviation and skeleton geometry constraint deviation by corresponding reverse weights respectively; Determining the credibility of key points of each key point according to the forward weighting result and the reverse weighting result; dividing each key point into a head key point set, a trunk key point set, an upper limb key point set and a lower limb key point set according to the human body structure connection relation; respectively calculating the average value of the credibility of all key points in each key point set to obtain the reliability of the head, the reliability of the trunk, the reliability of the upper limb and the reliability of the lower limb; The head reliability, the trunk reliability, the upper limb reliability, and the lower limb reliability are arranged in the order of the head, the trunk, the upper limb, and the lower limb, and a part reliability vector is generated.
4. The behavior recognition method based on image analysis according to claim 3, wherein the correcting the to-be-corrected key point with the reliability lower than the first threshold includes: Extracting a position sequence of a key point to be corrected in a preset number of historical frames; determining the displacement direction and the displacement amplitude of the key point to be corrected according to the position sequence; Determining the constraint position of the key point to be corrected according to the position of the key point adjacent to the key point to be corrected and the preset skeleton length constraint and joint angle constraint; Determining a first candidate position according to the displacement direction, the displacement amplitude and the constraint position; determining a local image area by taking the first candidate position as a center, and extracting local secondary key points of the local image area to obtain a second candidate position; And determining a weighting coefficient of the first candidate position and the second candidate position according to the adjacent frame displacement continuity deviation of the key point to be corrected and the local image definition index of the local image area, and determining the corrected target key point position according to the weighting coefficient.
5. The behavioral action recognition method based on image analysis of claim 4 wherein computing action boundary scores and boundary uncertainties, respectively, comprises: Determining the movement variation of the mass center according to the speed variation and the direction variation of the mass center of the target human body between adjacent video frames; According to displacement changes of the head, the trunk, the upper limb and the lower limb between adjacent video frames, determining the motion energy change quantity of each human body part; Determining the distance variation of the human body and the adjacent object according to the distance variation between the target human body and the adjacent object; Determining scene area switching amount according to whether the scene area type changes between adjacent video frames; Calculating action boundary scores according to the mass center movement variable quantity, the movement energy variable quantity of each human body part, the distance variable quantity of the human body and the adjacent object and the scene area switching quantity; calculating the uncertainty of the boundary according to the difference between the maximum value and the minimum value of the action boundary score in a preset time window and the variable quantity of the position reliability vector between adjacent video frames; and determining the video frames with the action boundary scores larger than the fourth threshold value as candidate boundary frames, and carrying out segment segmentation on the human body track according to the candidate boundary frames.
6. The behavioral action recognition method based on image analysis of claim 5, wherein generating a scene constraint action candidate set of corresponding candidate action segments based on a pre-established scene constraint mapping rule comprises: pre-establishing a corresponding relation table among scene area types, adjacent object types, human body posture states, human body displacement modes and action categories; the human body posture states comprise a standing state, a sitting state, a squatting state and a prone state, and the human body displacement modes comprise a static mode, a linear movement mode, a steering movement mode and a reciprocating movement mode; extracting a corresponding scene area type, a neighboring object type, a human body posture state and a human body displacement mode aiming at each candidate action segment; and searching action categories corresponding to the extracted results in the corresponding relation table, and determining the searched action categories as scene constraint action candidate sets of the corresponding candidate action fragments.
7. The method for identifying behavioral actions based on image analysis according to claim 6, wherein performing a first behavioral action identification on the corresponding candidate action segment results in a first identification result and a first identification confidence comprises: Firstly, performing rough classification on the corresponding candidate action fragments to obtain rough classification results of the corresponding candidate action fragments; Then, in the scene constraint action candidate set, in the action category range corresponding to the rough classification result, executing action fine classification on the corresponding candidate action fragments; And determining the action fine classification result as a first recognition result, and determining a classification score corresponding to the first recognition result as a first recognition confidence.
8. The behavioral action recognition method based on image analysis of claim 7 wherein determining the corresponding candidate action segment as the segment to be rechecked comprises any of: The first recognition confidence is larger than or equal to a fifth threshold value and smaller than or equal to a sixth threshold value; at least one part reliability in the part reliability vector is lower than a second threshold; The boundary uncertainty is greater than a third threshold; the fifth threshold is less than the sixth threshold.
9. The behavior recognition method based on image analysis according to claim 8, wherein performing high-density extraction and local feature re-extraction only for a local image region corresponding to a human body part with a part reliability lower than a second threshold in a segment to be re-checked and for an adjacent video frame centered on a video frame with the lowest first recognition confidence in the segment to be re-checked comprises: Taking a key point circumscribed area corresponding to a human body part with the position reliability lower than a second threshold value as a local image area; in the segment to be rechecked, taking the video frame with the lowest first recognition confidence as the center, and selecting the video frames with the preset number before and after the video frame as adjacent video frames; Performing high-density frame extraction on adjacent video frames according to a first preset frame extraction interval, wherein the first preset frame extraction interval is smaller than a second preset frame extraction interval adopted by a fragment to be rechecked in the first behavior action recognition; and carrying out local feature re-extraction on the local image area in the video frame obtained by high-density extraction, and using the re-extracted local features for second behavior action recognition.
10. The behavior movement recognition method based on image analysis according to claim 9, wherein performing consistency check on the first recognition result and the second recognition result and determining a final behavior movement category, a corresponding time interval and a final recognition confidence of the target human body comprises: judging whether the action category corresponding to the first recognition result is the same as the action category corresponding to the second recognition result; if the action categories are the same, further judging whether the second recognition confidence coefficient is larger than or equal to a seventh threshold value; when the two conditions are met at the same time, determining that the consistency check passes, and outputting the final behavior action category, the corresponding time interval and the final recognition confidence coefficient of the target human body according to the second recognition result; when either of the two conditions is not satisfied, determining that the consistency check is not passed; When the consistency check fails and the second recognition confidence coefficient is larger than the first recognition confidence coefficient, determining the action category and the time interval corresponding to the second recognition result as a final action category and a corresponding time interval, and determining the second recognition confidence coefficient as a final recognition confidence coefficient; And when the consistency check fails and the second recognition confidence coefficient is smaller than or equal to the first recognition confidence coefficient, determining the action category and the time interval corresponding to the first recognition result as the final action category and the corresponding time interval, and determining the first recognition confidence coefficient as the final recognition confidence coefficient.

Description

Behavior action recognition method based on image analysis Technical Field The invention relates to the field of image analysis and video processing, in particular to a behavior action recognition method based on image analysis. Background With the development of video monitoring equipment and image processing technology, automatic recognition of personnel behavior and actions is performed based on image analysis, and the method is gradually applied to scenes such as warehouse operation supervision, production site inspection, equipment operation monitoring and public area traffic management so as to timely acquire personnel behavior states through continuous videos and assist site management. In the prior art, human body targets in video frames are detected and tracked firstly, then key points, local image features or time sequence motion features of the human body are extracted, actions such as walking, carrying, operation, staying and the like are classified and identified by combining continuous frame information, and a part of schemes can also combine scene areas or adjacent object information to assist in judging identification results. However, in the prior art, motion classification is still directly performed based on original key point results or fixed-length video clips, and human bodies are always in shielding, blurring, turning, bending and motion linking states in continuous videos, so that the key point is easy to shake, miss or locally distort, the motion boundary is unstable in division, and under the condition that the motion boundary is unstable, the same gesture corresponds to different motion semantics due to different scene areas and adjacent objects, and finally, the recognition results are easy to be confused and poor in stability. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a behavior action recognition method based on image analysis, which aims to solve the technical problems in the prior art. The technical aim of the invention is realized by the following technical scheme: An image analysis-based behavior action recognition method comprises the following steps: The method comprises the steps of S1, obtaining a video stream to be identified, carrying out human body target detection and cross-frame correlation on video frames in the video stream to be identified to obtain human body tracks of target human bodies in continuous video frames, determining scene area types where the target human bodies are located based on video frame images corresponding to the human body tracks, and determining adjacent object types meeting preset space distance conditions with the target human bodies; S2, extracting key points of a human body from a video frame corresponding to a human body track, respectively acquiring detection confidence level, shielding proportion, adjacent frame displacement continuity deviation, corresponding local image definition index and skeleton geometric constraint deviation aiming at each key point, and determining the reliability of the key points based on the detection confidence level, the shielding proportion, the adjacent frame displacement continuity deviation, the corresponding local image definition index and the skeleton geometric constraint deviation; S3, determining a first candidate position of the key point to be corrected based on the track change trend of the key point to be corrected in the history frame and the skeleton geometric constraint relation between the key point and the adjacent key point, determining a local image area based on the first candidate position, extracting the local secondary key point from the local image area, determining a second candidate position of the key point to be corrected, carrying out weighted fusion on the first candidate position and the second candidate position according to the displacement continuity deviation of the adjacent frame of the key point to be corrected and the local image definition index of the local image area, determining the corrected target key point position, obtaining a corrected key point sequence, and updating the reliability vector of the part; S4, calculating action boundary scores and boundary uncertainty respectively based on the correction key point sequence, the movement track of the mass center of the target human body, the movement energy change of each human body part, the distance change between the target human body and the adjacent object and scene area switching information; S5, aiming at each candidate action segment, according to the scene area type, the adjacent object type, the human body posture state and the human body displacement mode corresponding to the candidate action segment, generating a scene constraint action candidate set corresponding to the candidate action segment based on a pre-established scene constraint mapping rule; S6, determining the corresponding candidate action segment as a segment to be checked when the first