CN-122024123-A - Motion quality assessment method based on fine granularity time-space alignment
Abstract
The invention discloses a motion quality assessment method based on fine granularity space-time alignment, which relates to the technical field of motion quality intelligent assessment. The invention remarkably improves the interpretability, fairness and practicability of the motion quality assessment, realizes the spanning from the black box total score to the transparent diagnosis report, improves the scoring precision, can provide specific and operable improvement suggestions for athletes and coaches, and practically meets the high-order application requirements in athletic training and rehabilitation guidance.
Inventors
- LIN FEI
- HUANG ZUJIAN
- ZHANG CONG
Assignees
- 杭州电子科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20251224
Claims (10)
- 1. A motion quality assessment method based on fine granularity time-space alignment is characterized by comprising the following steps of, Taking a motion sequence to be evaluated as query input, selecting a reference motion sequence of the same type as an alignment datum, and jointly inputting the query sequence and the reference sequence into a shared space-time feature extractor for processing; establishing a fine granularity space-time alignment module, wherein the fine granularity space-time alignment module is used for analyzing a query sequence and a reference sequence by utilizing a time sequence action segmentation network and establishing a space-time corresponding relation; and calculating fine granularity difference features based on the processed query sequence and the reference sequence, inputting the fine granularity difference features into a multitasking evaluation network, outputting motion quality scores, and completing motion quality evaluation.
- 2. The method for motion quality assessment based on fine-grained spatiotemporal alignment according to claim 1, wherein the extractor uses I3D (Inception-3D) or Video Swin Transformer as a backbone network, and uses multi-scale output, and the extractor outputs not only the last layer of features but also multi-level feature graphs, shallow features The method retains high spatial resolution for capturing fine displacement of limb end (such as wrist and ankle), deep features Has stronger semantic information and is used for representing the integral action structure.
- 3. The method for motion quality assessment based on fine-grained spatiotemporal alignment of claim 2, wherein the fine-grained spatiotemporal alignment module comprises a differentiable dynamic time warping network and a cross-modal attention mechanism; The differentiable dynamic time warping network is used for aligning key action frames in sub-action phases corresponding to semantics in the query sequence and the reference sequence in a time dimension, and optimizing and solving action execution rhythm inconsistency between the query sequence to be evaluated and the reference sequence; The cross-modal attention mechanism is used for aligning key body parts corresponding to semantics in the query sequence and the reference sequence in the space dimension, and optimizing and solving the gesture visual angle difference caused by orientation difference or gesture expression diversity during movement.
- 4. The method for evaluating motion quality based on fine granularity time-space alignment of claim 3, wherein establishing the time-space correspondence comprises learning to establish the time-space correspondence between the query sequence and the reference sequence from local to global through a differentiable dynamic time warping network and a cross-modal attention mechanism under the guidance of a sub-action stage; The sub-action phases comprise sub-action phases in the query sequence and corresponding semantic sub-action phases in the reference sequence.
- 5. The method for motion quality assessment based on fine granularity spatiotemporal alignment of claim 4, wherein calculating fine granularity difference features comprises calculating fine granularity difference features of a query sequence and a reference sequence in a sub-action phase based on a multi-scale spatiotemporal feature map; The multi-task evaluation network refers to the relative quality deviation between each sub-action stage in the query sequence and the corresponding semantic sub-action stage in the reference sequence; each sub-action phase in the query sequence comprises a sub-action phase in the motion sequence to be evaluated, which is analyzed by a time sequence action segmentation network; the corresponding semantic sub-action phases in the reference sequence comprise corresponding semantic sub-action phases obtained by dividing the same type of reference action sequences through the same time sequence.
- 6. The method for motion quality assessment based on fine granularity spatiotemporal alignment of claim 5, wherein inputting fine granularity difference features into a multi-tasking assessment network comprises end-to-end training of motion quality assessment using a joint loss function; The calculation formula of the joint loss function is as follows: Wherein, the Is the alignment loss of the spatio-temporal alignment module, Is the cross entropy loss of the time series action segmentation, Is the average absolute error loss of mass fraction prediction; In the process of performing end-to-end training on the motion quality evaluation, parameters of a time sequence action segmentation network, a fine granularity space-time alignment module and a multi-task evaluation network are jointly optimized through back propagation until the total loss function converges.
- 7. The method for motion quality assessment based on fine grain temporal-spatial alignment of claim 6 wherein said calculating fine grain difference features further comprises the steps of: For each sub-action stage of semantic alignment, respectively extracting a local gesture embedded vector and a global motion context vector of a query sequence and a reference sequence at the same semantic moment from the multi-scale space-time feature map after space-time alignment; Calculating a pose deviation metric for the key body part based on the local pose embedding vector, and calculating a consistency score based on the global motion context vector; and carrying out weighted fusion on the attitude deviation measurement and the consistency score to generate a structured fine granularity difference feature vector of the sub-action stage, wherein the structured fine granularity difference feature vector is used as input of a multi-task evaluation network.
- 8. A motion quality evaluation system based on fine granularity time-space alignment is characterized by comprising the following steps of, The input and feature extraction module takes a motion sequence to be evaluated as query input, selects a reference motion sequence of the same type as an alignment benchmark, and inputs the query sequence and the reference sequence into the shared space-time feature extractor together for processing; The fine granularity space-time alignment module is used for analyzing the query sequence and the reference sequence by utilizing a time sequence action segmentation network and establishing a space-time corresponding relation; The difference calculation and multitasking evaluation module is used for calculating fine granularity difference characteristics based on the processed query sequence and the reference sequence, inputting the fine granularity difference characteristics into a multitasking evaluation network, outputting a motion quality score and completing motion quality evaluation.
- 9. A computer device comprising a memory and a processor, said memory storing a computer program, characterized in that the processor, when executing said computer program, implements the steps of the fine-grained spatiotemporal alignment based motion quality assessment method of any of claims 1 to 7.
- 10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the fine-grained spatiotemporal alignment-based motion quality assessment method of any of claims 1 to 7.
Description
Motion quality assessment method based on fine granularity time-space alignment Technical Field The invention relates to the technical field of intelligent motion quality assessment, in particular to a motion quality assessment method based on fine granularity time-space alignment. Background Currently, in the crossing field of computer vision and sports science, the motion quality assessment technology based on deep learning is rapidly developed in recent years and widely applied to scenes such as athletic sports scoring, athlete auxiliary training, rehabilitation medical guidance, intelligent man-machine interaction and the like, along with the continuous increase of video data scale and continuous improvement of model capacity, an automatic assessment system is increasingly pursuing higher precision, stronger generalization capacity and logic closer to human expert judgment, and the current mainstream method generally adopts an end-to-end neural network, captures space attitude information through a feature extractor and combines a sequence modeler learning time dynamic, and finally returns an overall quality score, however, the black box type assessment paradigm obtains primary results on part of tasks, but is difficult to meet urgent demands on process transparency, diagnosis feedback capacity and individuation fairness in practical application. At present, the prior art has key defects, lacks process perception capability, most models consider the whole action sequence as a single whole to perform feature aggregation, cannot decompose complex actions into sub-stages with definite semantics, such as jump, air and water inflow, so that an evaluation result is only a general score, a specific error link cannot be positioned, a space-time alignment mechanism is coarse, the existing method or relies on absolute feature regression, or different action sequences are directly compared in an unaligned state, natural differences of individuals in action rhythm and execution speed are ignored, fair comparison of semantic equivalent moments is difficult to realize, feature representation and evaluation strategy coupling is insufficient, and single-scale features and absolute score prediction are mostly adopted, so that local details and global structures cannot be considered, modeling capability relative to standard action deviation is lacking, and stability of the evaluation result is poor, and interpretability is weak. Disclosure of Invention The present invention has been developed in view of the problems in the existing motion quality assessment methods based on fine-grained spatio-temporal alignment. Therefore, the problem to be solved by the invention is that the existing motion quality assessment method lacks process interpretability, is inaccurate in time-space alignment and cannot provide fine-grained diagnostic feedback based on sub-action phases, so that the assessment result is general, unfair and is difficult to guide improvement. In order to solve the technical problems, the invention provides the following technical scheme: In a first aspect, an embodiment of the present invention provides a motion quality assessment method based on fine granularity space-time alignment, which includes taking a motion sequence to be assessed as a query input, selecting a reference motion sequence of the same type as an alignment reference, inputting the query sequence and the reference sequence together into a shared space-time feature extractor for processing, constructing a fine granularity space-time alignment module, wherein the fine granularity space-time alignment module is used for resolving the query sequence and the reference sequence by using a time sequence action splitting network, establishing a space-time correspondence relationship, calculating fine granularity difference features based on the processed query sequence and the reference sequence, inputting the fine granularity difference features into a multitasking assessment network, outputting a motion quality score, and completing motion quality assessment. As a preferable scheme of the motion quality assessment method based on fine granularity time-space alignment, the time-space feature extractor adopts I3D (Inception-3D) or Video Swin Transformer as a backbone network. The I3D (Inception-3D) network comprises shallow features and deep features, wherein the shallow features are used for capturing local motion details of the space-time feature extractor in the network, and the deep features are used for representing global structural information of the space-time feature extractor in the network; The multi-scale spatiotemporal feature map includes cross-level alignment and enhancement by a feature pyramid fusion mechanism, In each sub-action stage, up-sampling shallow features to the space-time resolution of deep features, and dynamically weighting and fusing shallow detail response and deep semantic response through a gating attention mechanism;