CN-122021642-A - Language learning system and method for fusing domestic large model and knowledge graph

CN122021642ACN 122021642 ACN122021642 ACN 122021642ACN-122021642-A

Abstract

The invention discloses a language learning system and method integrating a domestic large model and a knowledge graph, relating to the technical field of artificial intelligence aided education, wherein the method collects the voice, text, face and mouth shape images of a learner in real time through a domestic education software base, the knowledge points access and cognition interaction data, cleaning, normalizing and feature fusion are carried out, and semantic alignment and knowledge association mapping are realized by utilizing a domestic feature coding engine and a knowledge map embedding mechanism; and automatically generating an optimization strategy according to the evaluation result, wherein the optimization strategy comprises large model fine adjustment, learning path adjustment and knowledge graph node expansion, so as to realize learning behavior transparentization monitoring and knowledge graph dynamic optimization. The method can improve the semantic mapping precision of the domestic large model, optimize the learning path management and enhance the knowledge graph evolution capability.

Inventors

LI JUNJIE
GUO QINGXIA
XU DONGSHENG
XU PENG
WANG ZHANGLONG
Chen Fenchao

Assignees

广东讯飞启明科技发展有限公司

Dates

Publication Date: 20260512
Application Date: 20260106

Claims (10)

1. The language learning method for fusing the localization large model and the knowledge graph is characterized by comprising the following steps of: Firstly, acquiring multi-modal data of a learner, including voice, text, face and mouth shape images and knowledge point access and cognition interaction information, in real time through a domestic education software base, and respectively establishing a voice expression parameter set, a text expression parameter set, a visual participation parameter set and a knowledge cognition parameter set; Step two, carrying out data cleaning, normalization and feature fusion on each data set in the step one, constructing a multidimensional language feature matrix by utilizing a domestic feature coding engine, realizing semantic alignment and knowledge association mapping by a knowledge graph embedding mechanism of a domestic large model, and generating a standardized language input parameter set; Step three, based on standardized language input parameter sets, word vector embedding difference degree, atlas node semantic aggregation degree and domestic chip operation delay normalization value are calculated, semantic alignment coefficient YDQ is comprehensively obtained, and is compared and analyzed with domestic semantic alignment threshold value Yth, the stability of the semantic mapping relation of the domestic large model is judged, corresponding strategies are given, and standardized language-knowledge atlas mapping matrix is generated; Step four, based on a standardized language-knowledge map mapping matrix, performing multi-mode semantic dependency analysis and syntactic structure matching, calculating a learning association index GLZ, performing comparison analysis with a learning path aggregation threshold value Gth, judging whether the matching degree of the answer semantics of a learner and the knowledge map path is qualified, giving a corresponding strategy, and finishing and generating an optimized learning data set of learning behavior and node mapping; And fifthly, based on an optimized learning data set of learning behaviors and node mapping, extracting misclustering density Ferr for language, new concept node generation frequency FNew and context dependency strength Fctx through multidimensional feature analysis of learner answer semantics and knowledge map nodes, calculating a knowledge growth driving coefficient ZSQ, comparing and analyzing with a knowledge evolution threshold Zth, judging whether the self-growth requirement of a knowledge map meets the standard or not, and giving a corresponding strategy to form a learning transparentization report.
2. The language learning method of integrating a localization large model with a knowledge graph according to claim 1, wherein the first step comprises: S11, acquiring a learner voice input signal in real time through voice recognition equipment in a domestic education software base, wherein the voice input signal comprises acquiring voice fundamental frequency through a domestic acoustic front-end component, acquiring average voice energy through a sound intensity analysis module, acquiring a voice pause rate through a ratio of inter-sentence silence duration to total duration of voice fragments, acquiring voice speech fragment length distribution through voice framing and clustering algorithm statistics, and establishing a voice expression parameter set; S12, acquiring text input signals of learners in real time through a text recognition and semantic analysis component in a domestic education software base, wherein the text input signals comprise word frequency distribution and word abundance are acquired through domestic lexical analysis equipment, sentence pattern complexity and main guest integrity rate are acquired through syntax structure recognition equipment, semantic consistency and theme concentration are acquired through semantic consistency detection equipment, and a text expression parameter set is established; s13, acquiring face and mouth shape image data of a learner in real time through visual recognition equipment in a domestic education software base, wherein the method comprises the steps of acquiring emotion change rate and expression stability through an expression recognition component, acquiring pronunciation mouth shape matching rate through a mouth shape synchronous detection component, acquiring fixation stability and concentration index through a fixation tracking device, and establishing a visual participation parameter set; S14, acquiring learner knowledge point access and cognition interaction data in real time through knowledge association sensing equipment in a domestic education software base, wherein the learning knowledge point access and cognition interaction data comprises the steps of acquiring knowledge point call frequency through a knowledge node activation monitoring component, acquiring logic jump depth between knowledge points through a semantic path identification device, acquiring knowledge reproduction rate and wrong question reproduction rate through a knowledge memory backtracking device, and establishing a knowledge cognition parameter set.
3. The language learning method of integrating a localization large model with a knowledge graph according to claim 1, wherein the second step comprises: S21, executing multidimensional data processing and standardization fusion steps based on a voice expression parameter set, a text expression parameter set, a visual participation parameter set and a knowledge cognition parameter set, wherein the steps comprise eliminating sampling delay and noise interference through a data cleaning and time sequence synchronization algorithm, unifying the dimension and value range of various characteristic values through normalization and standard deviation stretching processing, screening high-correlation characteristic dimensions through a principal component analysis and correlation aggregation algorithm, mapping and compressing the multimodal characteristic vectors based on a domestic characteristic coding engine, and constructing a multidimensional language characteristic matrix; S22, carrying out semantic alignment and knowledge association mapping on the multidimensional language feature matrix by adopting a knowledge graph embedding mechanism of a domestic large model, wherein the method comprises the steps of mapping high-frequency semantic units in the multidimensional language feature matrix to corresponding entity nodes of a knowledge graph through knowledge node embedding computing equipment, identifying semantic distances and upper and lower logic relations among different knowledge nodes through a relation reasoning component, acquiring matching degree of learner language input and a target semantic space through a self-adaptive weighting algorithm, and finally generating a standardized language input parameter set.
4. The language learning method of integrating a localization large model with a knowledge graph according to claim 1, wherein the third step comprises: S31, carrying out vector magnitude matching analysis on the voice fundamental frequency, the voice pause rate and the voice speech segment length distribution in the voice expression parameter set and the word frequency distribution, the sentence pattern complexity and the semantic consistency in the text expression parameter set by adopting a semantic vector comparison and embedding difference calculation method based on the standardized language input parameter set, and calculating the difference between the input semantic vector and the atlas node vector to obtain word vector embedding difference emb; s32, based on a standardized language input parameter set, carrying out node layer association analysis on knowledge point calling frequency, logic jump depth and knowledge reproduction rate in a knowledge cognitive parameter set by adopting a node aggregation and side weight variance calculation method, and calculating average variances of semantic association edges among nodes in a map to obtain a map node semantic aggregation degree node; S33, based on a standardized language input parameter set, adopting a time normalization and calculation power mapping analysis method to perform time sequence sampling on the running process of the feature coding engine in the domestic education software base, extracting the time-consuming information of the execution delay and matrix operation of the model in the calculation stage, calculating the normalization result of the operation delay, and obtaining the domestic chip operation delay normalization value hw.
5. The language learning method of combining a localization large model with a knowledge graph according to claim 1, wherein the third step further comprises: S34, embedding the difference emu, the spectrum node semantic aggregation node and the domestic chip operation delay normalization value hw into the obtained word vector, and calculating to obtain a semantic alignment coefficient YDQ after dimensionless processing; S35, comparing and analyzing the semantic alignment coefficient YDQ with the domestic semantic alignment threshold Yth by presetting the domestic semantic alignment threshold Yth, and acquiring a first evaluation result comprises: When the semantic alignment coefficient YDQ is more than or equal to the domestic semantic alignment threshold value Yth, the semantic mapping relation of the domestic large model is stable, and a standardized language-knowledge map mapping matrix is generated; When the semantic alignment coefficient YDQ is smaller than the domestic semantic alignment threshold value Yth, the semantic mapping relation of the domestic large model is unstable, semantic space deviation exists, the risk of insufficient semantic alignment is triggered, a first early warning instruction is triggered, a first strategy is generated, an embedded layer weight matrix fine adjustment and node vector remapping operation is executed by calling a fine adjustment control component in a domestic AI training base, semantic deviation errors are gradually reduced through a semantic error back transmission and interlayer gradient clipping mechanism, word vector embedding difference emb, map node semantic aggregation degree node and domestic chip operation delay normalization value hw are obtained again, and the semantic alignment coefficient YDQ is updated until the semantic alignment coefficient YDQ is larger than or equal to the domestic semantic alignment threshold value Yth.
6. The language learning method of integrating a localization large model with a knowledge graph according to claim 1, wherein the fourth step comprises: S41, performing multi-mode semantic dependency analysis on a speech expression parameter, a text expression parameter, a visual participation parameter and a knowledge cognition parameter based on a standardized language-knowledge map mapping matrix by adopting a semantic dependency path analysis and syntax structure analysis method to obtain a learner answer semantic dependency matrix Ssem, performing matching analysis on syntax structure features in the text expression parameter and the knowledge cognition parameter and knowledge map grammar nodes by adopting a syntax tree comparison and upper and lower logic reasoning method to obtain a syntax structure accuracy Ssyn, and performing context consistency assessment on the visual participation parameter and the knowledge cognition parameter by adopting a context semantic deviation analysis and large model generation result comparison method to obtain a context consistency score Sctx.
7. The language learning method of integrating a localization large model with a knowledge graph according to claim 1, wherein the fourth step further comprises: s42, calculating and obtaining a learning association index GLZ after dimensionless processing through the obtained learner answer semantic dependency matrix Ssem, the syntax structure accuracy Ssyn and the context consistency score Sctx; s43, comparing and analyzing the learning association index GLZ with the learning path aggregation threshold value Gth through the preset learning path aggregation threshold value Gth, wherein the obtaining of the second evaluation result comprises the following steps: When the learning association index GLZ is more than or equal to the learning path aggregation threshold value Gth, the matching degree of the answer semantics of the learner and the knowledge graph path is qualified, the learning path is stable, and the monitoring is continued; When the learning association index GLZ is smaller than the learning path aggregation threshold value Gth, meaning that the matching degree of learner answer semantics and a knowledge graph path is unqualified, a semantic association deviation risk exists, triggering a second early warning instruction, generating a second strategy, namely carrying out learning path weight adjustment, dynamically adjusting the weight of multi-modal features in path calculation according to the deviation amount of the learning association index GLZ, including speech expression weight, text expression weight, visual participation weight and knowledge cognition weight, improving the semantic dependency degree and the accuracy of a syntax structure, carrying out personalized node sequence generation, recalculating an answer semantic path of the learner and a knowledge graph node activation sequence based on the adjusted weight, generating a personalized learning node sequence, carrying out knowledge graph update feedback, comparing the personalized learning node sequence with an actual learning behavior log, carrying out node weight adjustment and side weight optimization through a node liveness monitoring and semantic path correction mechanism, carrying out large model fine adjustment, calling a fine adjustment control component in a semantic production large model training base, carrying out semantic error reverse transmission and gradient mechanism, gradually reducing the deviation error, and carrying out fine adjustment on an embedding layer weight matrix, and carrying out learning node weight correction and a learning model correction mechanism.
8. The language learning method of integrating a localization large model with a knowledge graph according to claim 1, wherein the fifth step comprises: S51, an optimized learning data set based on learning behaviors and node mapping is subjected to multidimensional feature extraction by adopting a method of error cluster analysis and new generation recognition and context dependency statistics of knowledge nodes, wherein the method comprises the steps of carrying out cluster analysis on learner answer semantic deviation, error question reproduction rate and semantic path deviation conditions in the optimized learning data set, calculating the error concentration degree of the words, obtaining the error cluster density Ferr for the words, carrying out statistic analysis on new semantic units generated by newly added node mapping and a large model in the knowledge map, extracting the generation frequency of new concept nodes, obtaining the generation frequency Fnew of the new concept nodes, and carrying out analysis on semantic co-occurrence probability among the knowledge map nodes, semantic dependency paths and the learner answer context matching degree, and extracting the context dependency strength Fctx.
9. The language learning method of integrating a localization large model with a knowledge graph according to claim 1, wherein the fifth step further comprises: S52, calculating and obtaining a knowledge growth driving coefficient ZSQ after dimensionless processing through the obtained semantic error cluster density Ferr, the new concept node generation frequency FNew and the context dependency strength Fctx; s53, comparing and analyzing the knowledge growth driving coefficient ZSQ with the knowledge evolution threshold Zth by presetting the knowledge evolution threshold Zth, wherein the obtaining of the third evaluation result comprises: When the knowledge growth driving coefficient ZSQ is less than or equal to the knowledge evolution threshold value Zth, the knowledge graph reaches the standard from the growth requirement, the semantic deviation and the new concept generation are in a controllable range, and the learning data and the node activity are continuously monitored; When the knowledge growth driving coefficient ZSQ > knowledge evolution threshold value Zth, the knowledge graph self-growth requirement is not up to standard, the risk of increasing semantic deviation or generating excessive new concept nodes exists, a third early warning instruction is triggered, and a third strategy is generated, wherein the third strategy is generated by automatically executing the operations of expanding the knowledge graph nodes and retraining semantic weights, including embedding semantic vectors into the newly-added nodes, and carrying out relation mapping and side weight optimization with the existing knowledge graph nodes; adjusting the output of the generation layer to enable the generation result of the localization large model to be consistent with the semantic space of the updated knowledge graph node; based on the learning data set subjected to learning path optimization and knowledge map updating and the knowledge map node activation sequence, a learning interpretation matrix is generated, and a scoring causal chain and a knowledge evolution process are visually output to form a learning transparent report.
10. The language learning system for fusing the localization large model and the knowledge graph is applied to the language learning method for fusing the localization large model and the knowledge graph as claimed in any one of claims 1 to 9, and is characterized by comprising the following steps: The multi-modal feature analysis module is used for collecting multi-modal data of a learner in real time through a domestic education software base, including voice, text, face and mouth shape images and knowledge point access and cognition interaction information, and respectively establishing a voice expression parameter set, a text expression parameter set, a visual participation parameter set and a knowledge cognition parameter set; the semantic fusion and mapping analysis module is used for carrying out data cleaning, normalization and feature fusion on each established data set, constructing a multidimensional language feature matrix by utilizing a domestic feature coding engine, realizing semantic alignment and knowledge association mapping by a knowledge graph embedding mechanism of a domestic large model, and generating a standardized language input parameter set; the semantic stability evaluation module is used for calculating word vector embedding difference, atlas node semantic aggregation and domestic chip operation delay normalization value based on a standardized language input parameter set, comprehensively obtaining a semantic alignment coefficient YDQ, carrying out comparative analysis on the semantic alignment coefficient YDQ and a domestic semantic alignment threshold value Yth, judging the stability of the semantic mapping relation of a domestic large model, giving a corresponding strategy, and generating a standardized language-knowledge atlas mapping matrix; The learning path association evaluation module is used for executing multi-mode semantic dependency analysis and syntactic structure matching based on the standardized language-knowledge map mapping matrix, calculating a learning association index GLZ, comparing and analyzing with a learning path aggregation threshold Gth, judging whether the answer semantics of a learner and the matching degree of the knowledge map path are qualified, giving a corresponding strategy, and finishing and generating an optimized learning data set of learning behaviors and node mapping; The knowledge growth monitoring evolution evaluation module is used for extracting the misclustering density Ferr of language, the new concept node generation frequency FNew and the context dependency strength Fctx through multidimensional feature analysis of learner answer semantics and knowledge map nodes based on the optimized learning data set of learning behaviors and node mapping, calculating the knowledge growth driving coefficient ZSQ, comparing and analyzing with the knowledge evolution threshold Zth, judging whether the self-growth requirement of the knowledge map meets the standard or not, and giving corresponding strategies to form a learning transparent report.

Description

Language learning system and method for fusing domestic large model and knowledge graph Technical Field The invention relates to the technical field of artificial intelligence aided education, in particular to a language learning system and method for fusing a localization large model and a knowledge graph. Background At present, an intelligent education platform generally depends on an external large model and a general semantic understanding framework in a language learning scene to realize semantic recognition, knowledge matching and learning behavior analysis. However, the existing language learning system mainly has the following problems: First, semantic understanding is associated with knowledge with non-domestic dependent risks. Most education intelligent systems rely on foreign open sources or commercial large models to carry out language processing and knowledge mapping, opacity exists in both a bottom algorithm and semantic space training data, domestic autonomous controllable requirements of the models in the aspects of semantic understanding, knowledge reasoning, data safety and the like cannot be guaranteed, and deep adaptation to domestic ecology is difficult to achieve. Secondly, the multi-modal feature fusion is insufficient, and comprehensive characterization of semantic expression of learners is difficult to achieve. The existing method generally relies on text input to perform semantic analysis, and omits cooperative characteristics of multi-mode information such as voice intonation, mouth shape actions, facial expressions, learning behavior data and the like, so that true semantic expression, emotional states and knowledge understanding depth of a learner cannot be sufficiently captured, and a semantic recognition result has deviation. Again, knowledge maps have limited relevance, lacking dynamic evolution and self-growth mechanisms. The traditional knowledge graph mainly takes a static structure, node expansion and side weight optimization cannot be carried out according to real-time semantic behaviors, cognitive path changes and error type distribution of learners, the evolution rule of a language knowledge system is difficult to reflect, and knowledge growth based on semantic feedback cannot be realized. Furthermore, semantic alignment and path matching lack a quantization decision mechanism. In the existing system, the matching degree of the semantic alignment effect and the knowledge path of the model is dependent on manual evaluation or heuristic rules, unified computable indexes and dynamic threshold control are lacked, and automatic evaluation and feedback adjustment of semantic stability and learning path rationality are difficult to realize. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a language learning system and a language learning method for fusing a localization large model and a knowledge graph, which are used for solving the problems in the background art. The language learning method for fusing the domestic large model and the knowledge graph comprises the following steps: Firstly, acquiring multi-modal data of a learner, including voice, text, face and mouth shape images and knowledge point access and cognition interaction information, in real time through a domestic education software base, and respectively establishing a voice expression parameter set, a text expression parameter set, a visual participation parameter set and a knowledge cognition parameter set; Step two, carrying out data cleaning, normalization and feature fusion on each data set in the step one, constructing a multidimensional language feature matrix by utilizing a domestic feature coding engine, realizing semantic alignment and knowledge association mapping by a knowledge graph embedding mechanism of a domestic large model, and generating a standardized language input parameter set; Step three, based on standardized language input parameter sets, word vector embedding difference degree, atlas node semantic aggregation degree and domestic chip operation delay normalization value are calculated, semantic alignment coefficient YDQ is comprehensively obtained, and is compared and analyzed with domestic semantic alignment threshold value Yth, the stability of the semantic mapping relation of the domestic large model is judged, corresponding strategies are given, and standardized language-knowledge atlas mapping matrix is generated; Step four, based on a standardized language-knowledge map mapping matrix, performing multi-mode semantic dependency analysis and syntactic structure matching, calculating a learning association index GLZ, performing comparison analysis with a learning path aggregation threshold value Gth, judging whether the matching degree of the answer semantics of a learner and the knowledge map path is qualified, giving a corresponding strategy, and finishing and generating an optimized learning data set of learning behavior and node mapping; A