CN-121999532-A - Intelligent evaluation method and device for Taiji boxing actions and computer program

CN121999532ACN 121999532 ACN121999532 ACN 121999532ACN-121999532-A

Abstract

The invention provides a Taiji boxing intelligent evaluation method, a device and a computer program, which are characterized in that by combining a compound scoring mechanism of key point position change and joint chain angle change, and combining the dynamically adjusted time threshold value to accurately capture the action conversion points in the fast and slow rhythms. The system adopts a two-dimensional to three-dimensional gesture conversion model, integrates time sequence, space and frequency domain characteristics, and utilizes a time transformer to model action continuity, analyzes joint topological relation through a space diagram attention network, and filters noise by a frequency domain enhancement module to highlight main frequency characteristics. The multi-feature fusion strategy is combined with the data enhancement method, so that the accuracy and the robustness of monocular video three-dimensional reconstruction are obviously improved. The invention realizes the self-adaptive key frame processing to improve the efficiency, and can complete the closed loop flow from capturing, reconstructing to evaluating on common hardware through the high-precision three-dimensional reconstruction and comprehensive quantization action details, thereby having professional precision and practical popularization value.

Inventors

ZHANG FENGQUAN
ZHAO YIXIONG

Assignees

北京邮电大学

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (10)

1. The intelligent evaluation method for the Taiji boxing action is characterized by comprising the following steps of: adjusting the motion video of the Taiji boxing to be analyzed to a preset resolution, and inputting a pre-trained two-dimensional gesture detection model to output two-dimensional key point coordinates of each frame; Comparing each frame with the historical frames, calculating a comprehensive motion change score by combining the distance variable between the coordinates of the key points and the angle change quantity of the joint chain, and calculating an importance score based on the average change rate and the gesture complexity; screening out that the comprehensive motion change score is higher than a dynamic threshold, wherein the importance score is higher than a set threshold and is used as a key frame, and the interval time between the importance score and the last key frame is within a preset time range, and the dynamic threshold is obtained by introducing dynamic time interval quantity to the importance score to calculate; The key point coordinates of each key frame are used as an original feature sequence to be input into a pre-trained two-dimensional to three-dimensional gesture conversion model to output three-dimensional gesture representation of the key frame containing depth information, wherein the two-dimensional to three-dimensional gesture conversion model performs the following operations of centralizing and normalizing the key point coordinates in the original feature sequence and introducing time position codes, inputting a multi-layer attention module and a feedforward network in a time Transformer to output time sequence features, constructing the key point coordinates in the original feature sequence as feature matrixes, introducing an adjacent matrix for describing the connection relation between the key points to construct graph structure data, inputting a graph-meaning network in a space Transformer to output space features, inputting the original feature sequences into a frequency domain enhancement module to obtain enhanced frequency domain features based on discrete cosine transformation to a frequency domain, and fusing the enhanced frequency domain features with the original feature sequences after low-pass filtering, inputting the time sequence features, the space features and the enhanced frequency domain features into a depth reasoning module to output original three-dimensional coordinates of each key frame through a multi-layer perceptron and a feedforward network; and comparing the similarity between the target three-dimensional coordinates of each key frame and preset standard three-dimensional coordinates of the Taiji boxing action, and calculating the action fluency according to the target three-dimensional coordinates to obtain an evaluation result.
2. The intelligent evaluation method of the Taiji boxing action according to claim 1, wherein after adjusting the video of the Taiji boxing action to be analyzed to a preset resolution and inputting a pre-trained two-dimensional gesture detection model to output two-dimensional key point coordinates of each frame, the intelligent evaluation method further comprises the steps of performing filtering and time sequence smoothing on the key point coordinates to reduce jitter and abnormal values; the two-dimensional gesture detection model adopts Hourglass, alphaPose, openPose or a YOLO-Pose model; the key point coordinates are collected for the nose, left and right eyes, left and right ears, left and right shoulders, left and right elbows, left and right wrists, left and right buttocks, left and right knees and left and right ankles.
3. The intelligent evaluation method of tai chi boxing action according to claim 1, wherein each frame is compared with a history frame, and a comprehensive motion change score is calculated by combining a distance variable between the coordinates of the key points and an angle change amount of a joint chain, comprising the following steps: and calculating the Euclidean distance sum between the coordinates of the key points between the current frame and the previous frame, wherein the calculation formula is as follows: Wherein, the Representing the euclidean distance sum; indicated at the current frame t The coordinates of the individual key points are used, Representing the t-1 st of the previous frame Coordinates of key points; the weight of the coordinates of the ith key point is the weight of the coordinates of the ith key point, and N is the number of the key points; calculating the angle change amount of the joint chain, wherein the calculation formula is as follows: Wherein, the Indicating the amount of change in the angle of the lens, Representing a set of chains of the joint, And The skeleton vector is represented by a vector of the skeleton, Weights representing the link chain composed between the key points i, j and k; After normalization processing is carried out on the Euclidean distance sum and the angle variation, the comprehensive motion variation score is calculated, and the calculation formula is as follows: ; Wherein, the Representing the integrated motion variation score, And Is a weight coefficient.
4. A method of intelligent evaluation of tai chi boxing action in accordance with claim 3, wherein calculating an importance score based on average rate of change and gesture complexity comprises the steps of: Calculating the average change rate of the coordinates of the key points, wherein the calculation formula is as follows: Wherein, the Representing the said average rate of change of the said values, Indicated at the current frame t The coordinates of the individual key points are used, Representing the t-1 st of the previous frame Coordinates of key points; the weight of the coordinates of the ith key point is the weight of the coordinates of the ith key point, and N is the number of the key points; Calculating the gesture complexity, wherein the calculation formula is as follows: Wherein, the Representing the complexity of the gesture in question, Representing a set of chains of the joint, And The skeleton vector is represented by a vector of the skeleton, Weights representing the link chain composed between the key points i, j and k; Calculating the importance score, wherein the calculation formula is as follows: ; Wherein, the The importance score is represented by a score of the importance, And Representing the weight coefficient; The dynamic threshold is calculated as follows: ; Wherein, the Representing the dynamic threshold value corresponding to the current instant t, Representing the base threshold value of the value, As a coefficient of the decay in time, Representing the time interval between the current key frame and the last key frame, Representing the maximum time interval allowed.
5. The tai chi boxing action intelligent assessment method according to claim 1, wherein the pre-training step of the two-dimensional to three-dimensional gesture conversion model comprises: acquiring a training data set containing a plurality of samples, wherein the samples are continuous video frames containing one or more human bodies, and adding sample key point coordinates as labels to the human bodies in each video frame; The method comprises the steps of training an initial posture conversion model comprising a time Transformer, a space Transformer, a frequency domain enhancement module, a depth reasoning module and a data enhancement fusion module by adopting a training data set, centralizing and normalizing sample key point coordinates in a video frame, introducing time position coding, inputting a multi-layer attention module and a feed-forward network in the time Transformer to output sample time sequence characteristics, constructing the sample key point coordinates in the video frame as a characteristic matrix, introducing an adjacent matrix for describing the connection relation between the sample key points to construct graph structure data, inputting a graph injection force network in the space Transformer to output sample space characteristics, inputting a sample key point coordinate sequence of the video frame to a frequency domain enhancement module to obtain sample enhancement frequency domain characteristics by fusion with the sample key point coordinate sequence of the original video frame after low-pass filtering, inputting the sample time sequence characteristics, the sample space characteristics and the sample enhancement frequency domain characteristics into the depth reasoning module to output a feature matrix for a coordinate matrix through a multi-layer perceptron and a trans-mer decoder, inverting the sample key point coordinate of the three-dimensional object coordinate of the video frame, inverting the original three-dimensional object coordinate of the video frame, and inverting the three-dimensional object coordinate of the original object after the three-dimensional object coordinate of the video frame; and updating parameters of the initial gesture conversion model based on the deviation construction loss of the three-dimensional coordinates of the target sample and the label to obtain the two-dimensional to three-dimensional gesture conversion model.
6. The intelligent evaluation method of Taiji boxing according to claim 1, wherein the data enhancement fusion module fuses the original three-dimensional coordinates and the inverted three-dimensional coordinates in a weighted aggregation mode, and the calculation formula is as follows: ; Wherein, the 、 As the weight coefficient of the light-emitting diode, Representing the three-dimensional coordinates of the i-th key point after fusion, Representing the original three-dimensional coordinates of the ith keypoint, Representing the inverted three-dimensional coordinates of the ith keypoint.
7. The intelligent evaluation method of tai chi boxing action according to claim 4, wherein the similarity comparison between the target three-dimensional coordinates of each key frame and the preset standard three-dimensional coordinates of tai chi boxing action comprises: Calculating the similarity between the target three-dimensional coordinate and the preset standard three-dimensional coordinate, wherein the calculation formula is as follows: ; Wherein, the Representing a set of three-dimensional coordinates of the object, Representing the set of preset standard three-dimensional coordinates, The weight of the i-th key point is represented, The importance score is represented by a score of the importance, Representing the distance between the three-dimensional coordinates of the target corresponding to the ith key point and the preset standard three-dimensional coordinates, wherein N is the number of the key points; The calculation process of the action fluency comprises the following steps: Calculating the difference of the coordinates of the key points of the adjacent key frames, wherein the calculation formula is as follows: ; Wherein, the Representing the difference between the ith key point coordinate in the current frame t and the previous frame t-1; indicated at the current frame t The coordinates of the individual key points are used, Representing the t-1 st of the previous frame Coordinates of key points; And carrying out weighted average on the difference of the coordinates of each key point in all the key frames to obtain the action fluency, wherein the calculation formula is as follows: ; Wherein, the Representing the fluency of the motion, N representing the number of key points, T representing the total number of key frames, Is the weight of the ith said keypoint coordinate.
8. A tai chi fist action intelligent assessment apparatus comprising a processor, a memory and a computer program or instructions stored on the memory, wherein the processor is adapted to execute the computer program or instructions, which when executed, implement the steps of the method of any one of claims 1 to 7.
9. A computer-readable storage medium, on which a computer program or instructions is stored, which, when executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. A computer program product comprising a computer program or instructions which, when executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

Description

Intelligent evaluation method and device for Taiji boxing actions and computer program Technical Field The invention relates to the technical field of image processing, in particular to a Taiji boxing intelligent evaluation method, device and computer program. Background The Taiji boxing is used as a traditional body-building mode containing deep philosophy, has the advantages of continuous movement, light and stable movement, and alternate speed and speed, and has remarkable benefits on physical and mental health. However, the conventional teaching mode is highly dependent on visual observation and personal experience of a coach, and has inherent limitations such as strong subjectivity, difficulty in quantification, non-uniform teaching standard and the like, so that a learner is difficult to obtain accurate and consistent feedback, and particularly cannot self-sense and correct fine deviation in actions. With the progress of computer vision and artificial intelligence technology, an automatic assessment system based on gesture estimation brings new possibilities for Taiji boxing teaching. In the prior art, various schemes such as action recognition based on a hybrid cascade architecture for processing electromyographic signals, two-dimensional gesture estimation based on a full convolution network, three-dimensional human body model generation from a single-view image, and a scoring method combining an attention mechanism and a long-term and short-term memory network appear. Although the technology realizes the digital analysis of the motion to a certain extent, a plurality of key defects still exist, namely, firstly, most systems lack of perception capability on depth information based on two-dimensional gesture analysis, body gesture requirements such as chest-containing back pulling and the like depending on three-dimensional space cannot be accurately estimated, secondly, key frame extraction mostly adopts a fixed time interval or a simple displacement threshold value, key conversion points in the rhythm variable motion of the Taiji boxing are difficult to accurately capture, so that estimation omission or redundancy is caused, furthermore, the traditional method focuses on the point evaluation on discrete static gestures, dynamic and continuous estimation on continuous motion flows cannot be realized, and finally, the systems generally lack of dynamic adaptability, self-adaptive adjustment of estimation strategies according to rapid change of motion speed are difficult, and slow fine motion and rapid conversion motion are not accurately estimated. Therefore, the prior art cannot meet the urgent need of precise, stereoscopic, consistent and adaptive intelligent evaluation of taijiquan motion, and a new scheme is needed to overcome the corresponding defects. Disclosure of Invention In view of this, the embodiment of the invention provides a method, a device and a computer program for intelligent evaluation of Taiji boxing actions, which are used for solving the problems that the prior art lacks of stereoscopic perception capability, is inaccurate in key frame extraction, is incoherent in evaluation and cannot dynamically adapt to action rhythm changes, so that the Taiji boxing actions are difficult to accurately, comprehensively and intelligently evaluate in real time. One aspect of the present invention provides a taijiquan action intelligent evaluation method, which includes the steps of: adjusting the motion video of the Taiji boxing to be analyzed to a preset resolution, and inputting a pre-trained two-dimensional gesture detection model to output two-dimensional key point coordinates of each frame; Comparing each frame with the historical frames, calculating a comprehensive motion change score by combining the distance variable between the coordinates of the key points and the angle change quantity of the joint chain, and calculating an importance score based on the average change rate and the gesture complexity; screening out that the comprehensive motion change score is higher than a dynamic threshold, wherein the importance score is higher than a set threshold and is used as a key frame, and the interval time between the importance score and the last key frame is within a preset time range, and the dynamic threshold is obtained by introducing dynamic time interval quantity to the importance score to calculate; The key point coordinates of each key frame are used as an original feature sequence to be input into a pre-trained two-dimensional to three-dimensional gesture conversion model to output three-dimensional gesture representation of the key frame containing depth information, wherein the two-dimensional to three-dimensional gesture conversion model performs the following operations of centralizing and normalizing the key point coordinates in the original feature sequence and introducing time position codes, inputting a multi-layer attention module and a feedforward network in a time Transf