CN-122023080-A - Multi-mode classroom interaction effect evaluation method for online education

CN122023080ACN 122023080 ACN122023080 ACN 122023080ACN-122023080-A

Abstract

The application provides a multi-mode classroom interaction effect evaluation method for online education, which comprises the steps of acquiring multi-mode data such as voice questions, text input and video pictures in the online classroom, acquiring respective time stamp information to obtain a data stream with time sequence marks, processing time offsets of the voice questions and the video pictures according to the data stream with the time sequence marks by adopting a time sequence alignment algorithm, determining a synchronized multi-channel data sequence, inputting a multi-mode fusion model through associated feature vectors, processing feature association to capture confused points in a learning state, determining a behavior mode in student interaction, analyzing the matching degree with teaching rhythm according to the behavior mode in student interaction, and adjusting effect evaluation parameters if the matching degree is lower than a preset threshold value to obtain an optimized learning state image.

Inventors

LIU FENGPING

Assignees

刘凤萍

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (7)

1. A multi-modal classroom interaction effect assessment method for online education, comprising: Collecting voice questioning data, text input data and video picture data in an online classroom, and obtaining respective time stamp information of the voice questioning data, the text input data and the video picture data to obtain a data stream with a time sequence mark; processing the time offset of the voice questioning data and the video picture data by adopting a time sequence alignment algorithm according to the data stream with the time sequence mark, and determining a synchronized multichannel data sequence; Extracting intonation features of the voice questioning data, keyword features of the text input data, and confusing expression features and limb action features of the video picture data from the synchronized multichannel data sequence to obtain a heterogeneous feature set; Aiming at the heterogeneous feature set, calculating a weight relation between the intonation feature of the voice questioning data and the confusing expression feature of the video picture data by adopting an attention mechanism, and fusing the intonation feature and the confusing expression feature when the weight is higher than a preset threshold value to obtain an associated feature vector; Inputting a multi-mode fusion model through the associated feature vector, processing the associated feature vector to capture confused points in student interaction, and determining a behavior mode in student interaction; According to the behavior pattern in the student interaction, analyzing the matching degree of the behavior pattern and the teaching rhythm, and adjusting the effect evaluation parameter when the matching degree is lower than a preset threshold value to obtain an optimized learning state portrait; And extracting key indexes from the optimized learning state portrait, verifying the coordination consistency of the voice questioning data, the text input data and the video picture data by adopting a time sequence alignment algorithm, and outputting an interactive effect evaluation result when the consistency meets the preset condition.
2. The multi-modal classroom interaction effect assessment method for online education according to claim 1, wherein the extracting of the intonation features of the voice question data, the keyword features of the text input data, and the confusing expression features and limb action features of the video picture data from the synchronized multi-channel data sequence comprises: performing spectral analysis on the voice question data to extract tone variation, volume intensity and speech speed characteristics as the intonation characteristics; Word segmentation processing and word vector conversion are carried out on the text input data so as to extract problem keywords and emotion tendency words as the keyword characteristics; And carrying out facial key point detection and gesture estimation on the video picture data to extract confusing expression characteristics of upward eyebrow and downward mouth angle and limb action characteristics of forward leaning body and frequent gesture.
3. The method for evaluating the interaction effect of multi-modal class for online education according to claim 1, wherein the inputting the multi-modal fusion model through the associated feature vector, processing the associated feature vector to capture the confusion points in student interaction, comprises: Inputting the associated feature vector into a multi-modal fusion model comprising a convolution layer and a loop layer; Extracting a local association mode in the association feature vector through the convolution layer; and processing the local association mode through the circulation layer to capture time sequence dependency relationship, and determining the confusion point distribution and duration of students in different periods of a classroom.
4. The multi-modal classroom interaction effect assessment method for online education according to claim 1, wherein the analyzing the matching degree of the behavior pattern and the teaching rhythm according to the behavior pattern in the student interaction comprises: extracting time sequence distribution of a knowledge point explanation stage and a questioning stage in the teaching rhythm; and comparing the overlapping degree of the occurrence time period of the confusing point in the behavior mode and the explanation stage of the knowledge point, and calculating a matching degree score.
5. The multi-modal classroom interaction effect assessment method for online education of claim 2, wherein the calculating a weight relationship between the intonation features of the voice question data and the confusing expression features of the video frame data using an attention mechanism for the heterogeneous feature set comprises: and inputting the intonation features and the confusing expression features into an attention layer, and calculating the cross attention score between the intonation features and the confusing expression features as a weight relation.
6. The multi-modal classroom interaction effect assessment method for online education of claim 3, wherein the extracting key indicators from the optimized learning status representation comprises: And extracting the frequency of the confusion points, the duration of confusion and the times of behavior mode switching as the key indexes.
7. The multi-modal classroom interaction effect assessment method for online education of claim 1, wherein the verifying the coordination of the voice question data, the text input data, and the video frame data using a time alignment algorithm comprises: And recalibrating time sequence deviation among the three by taking the time stamp information as a reference, and calculating a cross-mode consistency score.

Description

Multi-mode classroom interaction effect evaluation method for online education Technical Field The invention relates to the technical field of information, in particular to a multi-mode classroom interaction effect evaluation method for online education. Background The online education is taken as a learning mode breaking through space-time limitation, becomes an indispensable key component in a modern education system, and has the importance of providing high-quality resources on a large scale, promoting diversification of classroom interaction, and optimizing the teaching process through real-time effect evaluation, thereby remarkably improving participation degree and knowledge internalization efficiency of learners. With the technical progress, students in online class interact through voice questioning, text input, video picture display of facial expressions, limb actions and other modes, and the multi-channel behavior data provide a rich basis for deep understanding of learning states. However, when many methods process the interactive data, it is often difficult to realize synchronous coordination of different channel information, so that deviation appears in grasping the real-time state of the students. For example, a query in speech may occur simultaneously with a short-lived confusing expression in video, but these signals cannot be timely correlated without an effective time coordination mechanism, affecting the immediate capture of learning confusion. This lack of time coordination further exacerbates the difficulty of associating different channel features. Because the voice, the text and the visual data belong to heterogeneous forms, the voice, the text and the visual data respectively carry unique information, such as text reflecting knowledge blind spots, expression displaying emotion fluctuation and action reflecting attention change, and if deep association cannot be carried out on a feature level, the system is difficult to construct a complete portrait of the learning state of the student. In an actual online classroom, when students pass through the text input problem, the signals should be pointed at specific confused points together along with facial frowning and limb forward tilting, but because the association mechanism is imperfect, teachers can only rely on single channel feedback, and the opportunity of adjusting the teaching rhythm is missed, so that the interactive effect assessment flows on the surface. Therefore, how to effectively realize the time coordination and characteristic association of multi-mode data in online education, thereby accurately identifying learning behaviors in student classroom interaction becomes a key problem for improving effect evaluation accuracy. Disclosure of Invention The invention provides a multi-mode classroom interaction effect evaluation method for online education, which mainly comprises the following steps: And acquiring the respective time stamp information by acquiring multi-mode data such as voice questioning, text input, video pictures and the like in an online classroom, and obtaining a data stream with a time sequence mark. And processing the time offset of the voice question and the video picture by adopting a time sequence alignment algorithm according to the data stream with the time sequence mark, and determining the synchronized multichannel data sequence. And extracting intonation features of the voice questioning, keyword features of text input, and confusing expression and limb action features of the video picture from the synchronized multichannel data sequence to obtain a heterogeneous feature set. Aiming at the heterogeneous feature set, a attention mechanism is adopted to calculate the weight relation between the voice questioning feature and the confusing expression feature, and if the weight is higher than a preset threshold value, the features are fused to obtain an associated feature vector. And inputting the associated feature vectors into a multi-mode fusion model, processing the feature association to capture confused points in the learning state, and determining the behavior mode in student interaction. And analyzing the matching degree with the teaching rhythm according to the behavior mode in the student interaction, and adjusting the effect evaluation parameter if the matching degree is lower than a preset threshold value to obtain the optimized learning state portrait. And extracting key indexes from the optimized learning state image, re-verifying the coordination consistency of the multi-mode data by adopting a time sequence alignment algorithm, and outputting a final recognition result if the consistency meets the condition. Further, a synchronized multi-channel data sequence is determined. Further, if the weight is higher than a preset threshold, fusing the features to obtain an associated feature vector. Further, a behavior pattern in student interactions is determined. And further, if the matching degre