CN-121305619-B - Interactive self-adaptive intelligent teaching method and system based on motion capture

CN121305619BCN 121305619 BCN121305619 BCN 121305619BCN-121305619-B

Abstract

The application discloses an interactive self-adaptive intelligent teaching method and system based on motion capture, which belong to the technical field of digital intelligent teaching, and concretely comprise the steps of collecting videos of a student during motion in real time, dividing video frames into two-dimensional images, extracting human body target images, carrying out three-dimensional reconstruction on the human body target images, generating key points by using a neural network model, splicing the key points to construct a human body skeleton, carrying out gesture analysis on the human body skeleton, identifying the action behaviors of the student based on matching degree, carrying out matching comparison on the action behaviors of the student and preset standard actions in a knowledge graph, and correcting the identified nonstandard actions; the method and the system realize the normative assessment of the student actions, improve the gesture recognition precision and the action deviation judgment accuracy, and enhance the practicability of the intelligent teaching system in an actual teaching scene.

Inventors

WU ZHONGYU
LIANG GUOJI
XU XUFENG

Assignees

杭州睿数科技有限公司
杭州钛数科技有限公司

Dates

Publication Date: 20260512
Application Date: 20251113

Claims (6)

1. An interactive self-adaptive intelligent teaching method based on motion capture is characterized by comprising the following steps: Collecting videos of students in motion in real time, dividing video frames into two-dimensional images, and extracting human body target images; three-dimensional reconstruction is carried out on the human body target image, key points are generated by utilizing a neural network model, and the key points are spliced to construct a human body skeleton; Performing gesture analysis on the human skeleton, and identifying action behaviors of a learner based on matching degree, wherein the matching degree is obtained by performing matching calculation on a gesture analysis result and a preset action processing strategy; Matching and comparing the action behaviors of the students with preset standard actions in the knowledge graph, and correcting the recognized nonstandard actions; the method for acquiring the video of the student in real time during the movement of the student, dividing the video frame into two-dimensional images, extracting the human body target image comprises the following steps: Acquiring videos of the trainees in motion in real time, and dividing the acquired videos of the trainees in motion according to frames to obtain a continuous two-dimensional image sequence; Performing region division on the two-dimensional image to generate an initial candidate region; performing aggregation treatment on the initial candidate areas through pixel point similarity analysis to form a candidate area set; calculating the similarity of texture features between adjacent candidate areas, and constructing a similarity set; Sequencing candidate areas from high to low according to the similarity and merging the candidate areas; Extracting a human body target image from the combined region; The gesture analysis is performed on the human skeleton, and the action behaviors of the trainee are identified based on the matching degree, and the gesture analysis comprises the following steps: Carrying out gesture analysis on the complete human body structure model, and extracting action key parameters, wherein the action key parameters comprise head gesture, shoulder inclination angle, limb position relation and body gravity center offset; Matching and calculating the action key parameters with a preset action processing strategy to obtain matching degree, wherein the action processing strategy sets intervals and action thresholds for the parameters based on a human body structure model; judging whether the matching degree is in a preset threshold range, if the matching degree exceeds the preset threshold range, marking the action corresponding to the matching degree as abnormal action, and if the matching degree does not exceed the preset threshold range, marking the action corresponding to the matching degree as normal action; Matching calculation is carried out on the action key parameters and a preset action processing strategy to obtain matching degree, and the matching degree comprises the following steps: Constructing a multidimensional parameter space vector of the action based on the parameter range and the time sequence curve of the standard action template; Performing projection operation on the motion key parameters in a vector form in the multidimensional parameter space vector of the motion, and calculating the projection distance of the motion key parameters in each dimension to obtain a motion deviation value; carrying out weighted aggregation on the multi-dimensional deviation values to obtain matching degree; matching and comparing the action behaviors of the students with preset standard actions in the knowledge graph, and correcting the recognized nonstandard actions, wherein the matching and comparing comprises the following steps: performing similarity comparison on the normal action and a standard action template stored in the knowledge graph, wherein the standard action template comprises a structured action model and a corresponding parameter vector; setting a standard threshold, and judging that the action does not accord with the standard if the similarity comparison result is lower than the set standard threshold; recording deviation items of abnormal actions and actions which do not meet the standard, and correcting.
2. The interactive self-adaptive intelligent teaching method based on motion capture as claimed in claim 1, wherein the three-dimensional reconstruction of the human target image is performed, key points are generated by using a neural network model, and the human skeleton is constructed by splicing the key points, comprising: inputting a human body target image into a pre-trained three-dimensional reconstruction model, and reconstructing a human body three-dimensional structure image; extracting features of the reconstructed three-dimensional structure image to obtain an initial feature map; Inputting the initial feature map into a multi-stage neural network model, wherein the initial feature map is used as input in the first stage, and the output result of the previous stage is overlapped in the input of each stage in the subsequent stage; Extracting key points from the final output result of the multi-stage neural network model, wherein the final output result of the multi-stage neural network model comprises a key point confidence map and an associated vector field map; And connecting the key points according to the spatial relation among the extracted key points to form a complete human body structure model.
3. The method for intelligent teaching based on motion capture and adaptive interaction of claim 2, wherein the extracting key points from the final output result of the multi-stage neural network model comprises: Performing convolution operation on the final output result of the multi-stage neural network model according to a human body key point preset template to generate a confidence map corresponding to the key points, wherein the confidence map represents the spatial distribution probability of the corresponding key points in the image; Performing non-maximum value inhibition processing on the confidence map, and extracting a pixel point with the maximum response value as a candidate position of a key point; Screening the candidate positions of the key points by using a threshold value, and reserving the key point coordinates with the confidence coefficient higher than a set threshold value as effective key points; and numbering and classifying all the effective key points to construct a structured key point set.
4. The interactive self-adaptive intelligent teaching method based on motion capture as claimed in claim 3, wherein said connecting key points to form a complete human body structure model according to the spatial relationship between extracted key points comprises: determining all key point pairing sets to be connected according to the preset connection relation of the key points of the human body; Extracting an associated vector field diagram corresponding to each key point pairing from the feature diagram, wherein the vector field diagram represents a directional spatial relationship between two key points; In the candidate set of key points, respectively combining two candidate key points in the pairing, and calculating a response integral value of the two candidate key points along a connecting path in the associated vector field diagram; Selecting the best matching combination for each key point pairing based on the magnitude and the direction consistency of the response integral value; And assembling all the effectively connected key point combinations according to the human skeleton topological structure to generate a complete human structure model.
5. An interactive self-adaptive intelligent teaching system based on motion capture, which is used for realizing the interactive self-adaptive intelligent teaching method based on motion capture as claimed in any one of claims 1-4, and is characterized by comprising an image acquisition and processing module, a key point construction module, a motion capture module and a motion teaching module; The image acquisition and processing module is used for acquiring videos of students in motion in real time, dividing video frames into two-dimensional images and extracting human body target images; The key point construction module is used for carrying out three-dimensional reconstruction on the human body target image, generating key points by using a neural network model, and splicing the key points to construct a human body skeleton; The motion capture module is used for analyzing the gesture of the human skeleton and identifying the action behaviors of the students based on the matching degree; And the action teaching module is used for matching and comparing the action behaviors of the students with preset standard actions in the knowledge graph and correcting the recognized nonstandard actions.
6. The interactive adaptive intelligent teaching system based on motion capture of claim 5, wherein the motion capture module comprises a gesture analysis unit and an anomaly determination unit; The gesture analysis unit is used for carrying out gesture analysis on the complete human body structure model and extracting action key parameters; the abnormality determination unit is used for performing matching calculation on the action key parameters and a preset action processing strategy, and determining whether the matching degree is within a preset threshold range.

Description

Interactive self-adaptive intelligent teaching method and system based on motion capture Technical Field The invention belongs to the technical field of digital intelligent teaching, and particularly relates to an interactive self-adaptive intelligent teaching method and system based on motion capture. Background Motion capture technology is gradually applied to the education field, and digital modeling of human body motion is realized through means of video acquisition, gesture recognition, skeleton reconstruction and the like. However, most of the existing motion capture teaching systems adopt a unidirectional feedback mechanism, lack the capability of dynamic matching and self-adaptive guidance based on a knowledge base or a standard model, and cannot provide differentiated and real-time intelligent feedback according to the deviation degree of individual motions of students. Meanwhile, most systems fail to build a complete action quality evaluation model and a teaching knowledge graph, and lack the capability of structurally comparing false actions with standard teaching material contents. In addition, in the motion recognition process, the prior art mainly depends on static images or low-dimensional characteristics, and three-dimensional reconstruction and time sequence information cannot be fully utilized, so that motion recognition accuracy is low, and fine errors such as shoulder micro-lifting and waist tilting cannot be accurately distinguished, and the application effect of intelligent teaching in high-accuracy education scenes is limited. Disclosure of Invention Aiming at the defects of the prior art, the invention provides an interactive self-adaptive intelligent teaching method and system based on motion capture. In order to achieve the above purpose, the present invention provides the following technical solutions: An interactive self-adaptive intelligent teaching method based on motion capture, comprising: Collecting videos of students in motion in real time, dividing video frames into two-dimensional images, and extracting human body target images; three-dimensional reconstruction is carried out on the human body target image, key points are generated by utilizing a neural network model, and the key points are spliced to construct a human body skeleton; carrying out gesture analysis on the human skeleton, and identifying action behaviors of students based on the matching degree; matching and comparing the action behaviors of the students with preset standard actions in the knowledge graph, and correcting the recognized nonstandard actions. Specifically, the real-time collection of the video of the student during the movement, and the segmentation of the video frame into two-dimensional images, the extraction of the human body target image, includes: Acquiring videos of the trainees in motion in real time, and dividing the acquired videos of the trainees in motion according to frames to obtain a continuous two-dimensional image sequence; Performing region division on the two-dimensional image to generate an initial candidate region; performing aggregation treatment on the initial candidate areas through pixel point similarity analysis to form a candidate area set; calculating the similarity of texture features between adjacent candidate areas, and constructing a similarity set; Sequencing candidate areas from high to low according to the similarity and merging the candidate areas; and extracting the human body target image from the combined region. Specifically, the three-dimensional reconstruction of the human body target image generates key points by using a neural network model, and the human body skeleton is constructed by splicing the key points, which comprises the following steps: inputting a human body target image into a pre-trained three-dimensional reconstruction model, and reconstructing a human body three-dimensional structure image; extracting features of the reconstructed three-dimensional structure image to obtain an initial feature map; Inputting the initial feature map into a multi-stage neural network model, wherein the initial feature map is used as input in the first stage, and the output result of the previous stage is overlapped in the input of each stage in the subsequent stage; Extracting key points from the final output result of the multi-stage neural network model, wherein the final output result of the multi-stage neural network model comprises a key point confidence map and an associated vector field map; And connecting the key points according to the spatial relation among the extracted key points to form a complete human body structure model. Specifically, the extracting the key points from the final output result of the multi-stage neural network model includes: Performing convolution operation on the final output result of the multi-stage neural network model according to a human body key point preset template to generate a confidence map corresponding to the