CN-121980515-A - Self-adaptive learning AI system based on multi-mode fusion driving

CN121980515ACN 121980515 ACN121980515 ACN 121980515ACN-121980515-A

Abstract

The invention provides a multi-mode fusion driving-based self-adaptive learning AI system, which comprises a mode quality assessment module, a state change judgment module, a credibility judgment module, a multi-mode fusion module and a learning feedback generation module, wherein the mode quality assessment module respectively calculates a mode quality vector representing acquisition quality for input image mode characteristics, voice mode characteristics and text mode characteristics. According to the invention, the mode quality vector reflecting the acquisition reliability is calculated for the image, voice and text modes respectively by setting the mode quality evaluation module, and the mode quality vector and the change judgment result are introduced simultaneously as the joint decision basis in the credibility judgment stage, so that the system can distinguish the characteristic abnormality caused by the acquisition condition deterioration from the stage state change caused by the learning process, the suppression or elimination of the mode only according to the characteristic deviation historical distribution is avoided, and the judgment accuracy of the multi-mode characteristics in fusion is improved.

Inventors

YANG BAI

Assignees

爻象科技(广州)有限公司

Dates

Publication Date: 20260505
Application Date: 20260130

Claims (10)

1. The self-adaptive learning AI system based on the multi-mode fusion driving is characterized by comprising a mode quality evaluation module, a state change judging module, a credibility judging module, a multi-mode fusion module and a learning feedback generating module; the modal quality assessment module calculates modal quality vectors representing acquisition quality for input image modal characteristics, voice modal characteristics and text modal characteristics respectively; The state change judging module calculates change characteristics relative to historical characteristic distribution for the image mode characteristics, the voice mode characteristics and the text mode characteristics respectively, and outputs a change judging result for distinguishing learning state change and acquisition degradation; The credibility judging module receives the modal quality vector and the change judging result, and outputs a suppression or rejection instruction of a certain modal when the modal quality vector of the certain modal is lower than a preset quality threshold and the corresponding change judging result is acquisition degradation; the multi-mode fusion module performs weighted fusion on the mode characteristics of the reserved modes under the constraint of the credibility weight and the inhibition or rejection instruction to generate fusion characterization; The learning feedback generation module generates learning feedback based on the fusion characterization.
2. The adaptive learning AI system based on multi-modal fusion driving of claim 1, further comprising a multi-modal collection module and a data alignment module; The multi-modal collection module collects image data, voice data and text data of a learner, writes the image data, the voice data and the text data into a learner identifier and a time mark, and the data alignment module performs cross-modal time alignment on three types of data according to the time mark and then outputs the image modal characteristics, the voice modal characteristics and the text modal characteristics.
3. The adaptive learning AI system based on multi-modal fusion driving of claim 2 wherein the data alignment module maps the image frame sequence, the speech frame sequence, and the text sequence into the same time span window using a sliding window alignment mechanism and writes an interactive round identifier or a learning phase identifier for each window.
4. The adaptive learning AI system based on multi-modal fusion driving of claim 1 wherein the modal quality assessment module includes at least a sharpness index, an occlusion ratio index, and a key point availability index for an image modal quality vector, wherein the sharpness index is obtained from image gradient energy or Laplacian variance, the occlusion ratio index is obtained from a target region miss ratio, and the key point availability index is obtained from a key point detection success count ratio.
5. The adaptive learning AI system as set forth in claim 1 wherein the modal quality assessment module comprises at least a signal-to-noise ratio indicator, a voice activity detection confidence indicator, and a recognition confidence indicator for the voice modal quality vector, wherein the recognition confidence indicator is derived from a posterior probability, a confidence score, or a candidate sequence score of an automated voice recognition output.
6. The adaptive learning AI system based on multi-modal fusion driving as set forth in claim 1 wherein the modal quality assessment module includes at least a text length stability indicator, a character error rate indicator, and a readability indicator for a text modal quality vector, wherein the character error rate indicator is obtained from character edit distance statistics and the readability indicator is obtained from a syntactic integrity score or a language model confusion.
7. The adaptive learning AI system based on multi-modal fusion driving as set forth in claim 1 wherein the state change determination module includes at least a mean shift amount, a variance change amount, and a duration for the change feature and associates the duration with a learning phase identifier or an interactive turn identifier for distinguishing short-time transition changes from long-time degradation changes.
8. The adaptive learning AI system based on multi-modal fusion driving of claim 7 wherein the state change determination module generates a change trigger signal when a mean shift amount and variance change amount reach a preset change threshold, and outputs a change determination result as a learning state change when a duration is less than a preset degradation duration threshold and a corresponding modal quality vector satisfies a preset quality threshold.
9. The adaptive learning AI system based on multi-modal fusion driving of claim 1 wherein the multi-modal fusion module performs a weighted sum fusion or an attention weighted fusion of reserved modalities with confidence weights to obtain the fusion characterization by performing a weight zero setting or not participating in a fusion calculation of the corresponding modalities when receiving a suppression or rejection instruction.
10. The adaptive learning AI system based on a multi-modal fusion driving as set forth in claim 1, further comprising a parameter update module configured to form feedback evaluation information based on response behavior data of the learning terminal to the learning feedback, and update a preset quality threshold, a preset variation threshold, a preset degradation duration threshold, and a reliability weight calculation parameter according to the feedback evaluation information for reliability determination of a subsequent interaction round.

Description

Self-adaptive learning AI system based on multi-mode fusion driving Technical Field The invention relates to the technical field of artificial intelligence AI, in particular to an adaptive learning AI system based on multi-mode fusion driving. Background In the intelligent education and self-adaptive learning system, the learning state of a learner is generally analyzed by collecting multi-modal behavior data such as images, voices, texts and the like of the learner, and teaching feedback is generated according to the learning state. Because of the difference in acquisition quality and expression stability of multi-mode data, the prior art generally introduces a mode confidence level or quality assessment mechanism before multi-mode fusion to determine the weight of each mode in the fusion process or whether to participate in subsequent decisions. In the existing technical scheme, the mode confidence coefficient is calculated mainly based on the difference degree between the current mode characteristic and the mode history mean characteristic, for example, residual error or distance is adopted as a confidence measure basis, and the mode with larger deviation from the history mean is judged to be an unreliable mode, so that weight reduction or rejection processing is carried out. The method is simple in implementation, but has obvious defects in practical teaching application. In the real learning process, the key cognitive state of the learner often presents a characteristic of gradual change, such as confusion, understanding breakthrough, tension or fatigue, and the like, which usually causes significant changes of the image expression, the voice rhythm or the text expression mode relative to the historical behavior distribution. If the 'stepwise offset' is directly equal to the unreliable mode, the effective signal reflecting the real learning state change is easily misjudged as abnormal data, so that the key mode is wrongly de-weighted or shielded at important teaching nodes, and the accuracy of learning state identification and the matching of teaching feedback are affected; For this purpose, an adaptive learning AI system based on a multi-modal fusion drive is proposed. Disclosure of Invention In view of this, the present invention provides an adaptive learning AI system based on a multi-modal fusion drive to solve or alleviate the technical problems existing in the prior art, at least providing a beneficial option. The technical scheme of the invention is realized by an adaptive learning AI system based on multi-mode fusion driving, wherein the system comprises a mode quality evaluation module, a state change judgment module, a reliability judgment module, a multi-mode fusion module and a learning feedback generation module; the modal quality assessment module calculates modal quality vectors representing acquisition quality for input image modal characteristics, voice modal characteristics and text modal characteristics respectively; The state change judging module calculates change characteristics relative to historical characteristic distribution for the image mode characteristics, the voice mode characteristics and the text mode characteristics respectively, and outputs a change judging result for distinguishing learning state change and acquisition degradation; The credibility judging module receives the modal quality vector and the change judging result, and outputs a suppression or rejection instruction of a certain modal when the modal quality vector of the certain modal is lower than a preset quality threshold and the corresponding change judging result is acquisition degradation; the multi-mode fusion module performs weighted fusion on the mode characteristics of the reserved modes under the constraint of the credibility weight and the inhibition or rejection instruction to generate fusion characterization; The learning feedback generation module generates learning feedback based on the fusion characterization. Further preferably, the system further comprises a multi-mode acquisition module and a data alignment module; The multi-modal collection module collects image data, voice data and text data of a learner, writes the image data, the voice data and the text data into a learner identifier and a time mark, and the data alignment module performs cross-modal time alignment on three types of data according to the time mark and then outputs the image modal characteristics, the voice modal characteristics and the text modal characteristics. Further preferably, the data alignment module adopts a sliding window alignment mechanism to map an image frame sequence, a voice frame sequence and a text sequence into the same time span window, and writes an interactive round identifier or a learning stage identifier for each window. Further preferably, the mode quality assessment module at least comprises a definition index, an occlusion proportion index and a key point availability index for the image m