CN-121148609-B - Cognitive diagnosis method and model based on emotion state

CN121148609BCN 121148609 BCN121148609 BCN 121148609BCN-121148609-B

Abstract

The application relates to the technical field of intelligent education, in particular to a cognitive diagnosis method and model based on an emotion state, wherein the model comprises a cascaded multi-mode information feature extraction module, a feature alignment layer, a mode selection layer, a self-adaptive supermode learning layer and a cross-mode fusion conversion layer; the method comprises the steps of extracting emotion-related multi-mode features from teaching videos, obtaining corresponding knowledge point information of the emotion-related multi-mode features, projecting the extracted multi-mode features to a unified low-dimensional space, enabling data to be in the same dimension, preparing for subsequent fusion, adaptively selecting a main mode and an auxiliary mode, forming emotion semantic expression through information interaction between the main mode and the auxiliary mode, fusing the main mode and the supermode state, and generating a unified supermode state representing emotion cognition states of students. The application solves the problem that the cognitive diagnosis in the prior art is not comprehensive enough, and has the characteristics of high accuracy and applicability.

Inventors

WU WENYAN
HU QINTAI
SHEN XINGBO
ZHENG XIAONA
ZHANG ZHENWEI

Assignees

广东工业大学

Dates

Publication Date: 20260512
Application Date: 20250723

Claims (8)

1. The cognitive diagnosis model based on the emotion state is characterized by comprising a cascaded multi-mode information feature extraction module, a feature alignment layer, a mode selection layer, a self-adaptive supermode learning layer and a cross-mode fusion conversion layer; the multi-modal information feature extraction module is used for extracting emotion-related multi-modal features from the teaching video and obtaining knowledge point information corresponding to the multi-modal features; the feature alignment layer is used for projecting the extracted multi-mode features to a unified low-dimensional space, so that the data are in the same dimension and are prepared for subsequent fusion; The mode selection layer is used for adaptively selecting a main mode and an auxiliary mode based on the information quantity of each mode in the multi-mode characteristics; the mode selection layer further carries out mode weight distribution based on uncertainty estimation, specifically, a random discarding layer is arranged for the characteristics of each mode, variance of a prediction result is calculated through multiple forward reasoning in a training stage, so as to quantify the uncertainty of each mode, and the mode weight is determined according to a preset uncertainty threshold or a weighting formula combined with the information entropy; The self-adaptive supermode learning layer is used for recursively updating and forming a supermode state expressing emotion semantics through information interaction between the main mode and the auxiliary mode; the cross-modal fusion conversion layer is used for carrying out deep fusion on the main modal state and the supermodal state, and generating unified student emotion cognitive state to represent the supermodal state based on cognitive diagnosis of the emotion state.
2. The cognitive diagnostic model of claim 1, wherein the multi-modal information feature extraction module introduces a self-supervised contrast learning mechanism to pretrain and optimize the raw modal data before extracting features, and the mechanism guides more discriminative embedded representations in the model learning mode by constructing positive and negative sample pairs to provide robust initial features for subsequent modeling.
3. The cognitive diagnostic model of claim 1, wherein the multimodal information feature extraction module comprises in particular a BERT pre-training language model for extracting text semantic features, an analyzer based on OpenFace systems to extract visual behavior, and an audio feature extractor employing Librosa tool library.
4. The cognitive diagnostic model of claim 1, wherein the feature alignment layer comprises a cross-modal semantic distillation module that uses a pre-trained multi-modal model as a teacher model to generate soft labels aligned between modalities as supervisory signals, and wherein the probability distribution consistency of each modal feature in the student model after projection into the low-dimensional space is constrained by knowledge distillation loss to promote semantic consistency of feature alignment.
5. The cognitive diagnostic model of claim 1, wherein the adaptive supermode learning layer comprises at least two fransformer coding modules and a plurality of supermode information fusion layers, wherein the fransformer coding modules are used for hierarchically extracting deep semantic features of the main mode information, and the supermode information fusion layers are used for guiding information interaction between the auxiliary mode and the main mode so as to form a unified supermode representation.
6. The cognitive diagnostic model of claim 1, wherein the cross-modal fusion conversion layer introduces a learnable additional marker and a position code between a source modal sequence and a target modal sequence, and completes the deep fusion of multi-modal information by combining a cross-attention mechanism, and finally extracts the output of a specific position in the fusion sequence as a emotion cognitive state vector with uniform dimension.
7. The cognitive diagnostic model of claim 1, further comprising an emotion state-based cognitive diagnostic module for dynamically predicting the mastering level of each knowledge point by a student by combining emotion cognitive states of the student with knowledge point information, wherein the emotion state-based cognitive diagnostic module further constructs a knowledge point relation graph, nodes are knowledge points, edges are prior dependency relations or data-driven co-occurrence relations among the knowledge points, a state vector which is output by a bidirectional recurrent neural network and contains emotion information is used as initial characteristics of each knowledge point node, knowledge point neighborhood information is aggregated through a graph attention network GAT, node representation is updated, and mastering probability of each knowledge point is finally generated.
8. Cognitive diagnostic method based on emotional state, characterized in that it is based on a cognitive diagnostic model according to any of claims 1-7, comprising the following steps: extracting emotion-related multi-modal characteristics from the teaching video, and acquiring corresponding knowledge point information; Projecting the extracted multi-mode features to a unified low-dimensional space to enable the data to be in the same dimension, and preparing for subsequent fusion; adaptively selecting a main mode and an auxiliary mode based on the information quantity of each mode in the multi-mode characteristics; Recursively updating and forming a supermode state expressing emotion semantics through information interaction between the main mode and the auxiliary mode; And carrying out deep fusion on the main mode state and the supermode state, and generating unified student emotion cognitive state representation based on cognitive diagnosis of the emotion state.

Description

Cognitive diagnosis method and model based on emotion state Technical Field The application relates to the technical field of intelligent education, in particular to a cognitive diagnosis method and model based on an emotion state. Background The cognitive diagnosis model (CDM, cognitive Diagnosis Model) is used as an important component in the intelligent education system and is widely applied to the scenes of student knowledge mastering level evaluation, personalized learning path recommendation, learning effect feedback and the like. The traditional cognitive diagnosis method is mostly dependent on the answering behavior data of students, and the mastery degree of the students on each knowledge point is deduced by combining the relation between the questions and the knowledge points. However, the existing methods have the following general disadvantages: on one hand, the diagnosis method based on the single response result is difficult to comprehensively reflect the real cognitive state of students in the learning process, and particularly when facing student groups with complex emotion fluctuation or attention variation, the traditional method often cannot accurately capture potential cognitive deviation, so that the diagnosis result is unilateral. On the other hand, although the multi-modal information processing technology has advanced to some extent in recent years, partial research attempts to assist teaching analysis by combining language, facial expression or voice data, the multi-modal information has strong isomerism, and how to effectively align, fuse and utilize each modal information, so that the accuracy and the practicability of a cognitive diagnosis model are improved, and still lack of a mature and unified technical scheme. In addition, the emotional state of students is closely related to cognitive performance, and a large number of educational psychology researches show that the emotional change directly affects the learning investment, the understanding ability and the knowledge absorption level of the students. If the multi-modal emotion information of the students can be obtained and fused in real time, the comprehensiveness and the accuracy of the cognitive diagnosis can be improved. Therefore, a new technical scheme is needed to be able to integrate multi-modal information, adaptively capture emotion and cognitive state changes of students, dynamically predict the mastering condition of the students on each knowledge point, and improve accuracy, interpretation and applicability of the cognitive diagnosis system. Disclosure of Invention Based on the above, it is necessary to provide a cognitive diagnosis method and model based on emotion state, which have the characteristics of high accuracy and applicability, aiming at the technical problem that the cognitive diagnosis is not comprehensive enough. In order to achieve the above purpose of the present invention, the following technical scheme is adopted: a cognitive diagnostic method based on emotional state, comprising the steps of: extracting emotion-related multi-modal characteristics from the teaching video, and acquiring corresponding knowledge point information; Projecting the extracted multi-mode features to a unified low-dimensional space to enable the data to be in the same dimension, and preparing for subsequent fusion; adaptively selecting a main mode and an auxiliary mode based on the information quantity of each mode in the multi-mode characteristics; Recursively updating and forming a supermode state expressing emotion semantics through information interaction between the main mode and the auxiliary mode; and carrying out deep fusion on the main model state and the supermodel state, and generating unified student emotion cognitive state representation supermodel state based on cognitive diagnosis of the emotion state. A cognitive diagnosis model based on an emotion state comprises a cascaded multi-mode information feature extraction module, a feature alignment layer, a mode selection layer, a self-adaptive supermode learning layer and a cross-mode fusion conversion layer; the multi-modal information feature extraction module is used for extracting emotion-related multi-modal features from the teaching video and obtaining knowledge point information corresponding to the multi-modal features; the feature alignment layer is used for projecting the extracted multi-mode features to a unified low-dimensional space, so that the data are in the same dimension and are prepared for subsequent fusion; The mode selection layer is used for adaptively selecting a main mode and an auxiliary mode based on the information quantity of each mode in the multi-mode characteristics; The self-adaptive supermode learning layer is used for recursively updating and forming a supermode state expressing emotion semantics through information interaction between the main mode and the auxiliary mode; the cross-modal fusion conversion layer is used fo