Search

CN-121168472-B - Education data report content interaction method and system based on retrieval enhancement generation

CN121168472BCN 121168472 BCN121168472 BCN 121168472BCN-121168472-B

Abstract

The invention relates to the technical field of artificial intelligence and discloses a method and a system for interaction of educational data report contents based on retrieval enhancement generation. The method comprises the steps of searching a structured semantic index library to generate a search result set when the problem relates to cross-document association analysis, matching a policy knowledge graph by combining semantic similarity calculation with an entity link technology if the problem relates to policy association analysis, then performing cross-modal fusion processing to obtain the search result set, inputting the search result set into a search enhancement generation model, and calling an education field language model to generate an analysis report. The method and the system can solve the problems that in a traditional interaction mode, intention identification is inaccurate, cross-document analysis efficiency is low, latest policies cannot be dynamically combined with industry pain points, and the like, remarkably improve the efficiency and quality of data report interaction in the education field, and optimize user interaction experience.

Inventors

  • CHEN JIAYANG
  • CAO CHEN
  • Zhai he
  • WANG MENGPING

Assignees

  • 麦可思数据(北京)有限公司

Dates

Publication Date
20260512
Application Date
20250804

Claims (9)

  1. 1. A method of interacting content of an educational data report generated based on retrieval enhancement, comprising: Receiving a natural language question input by a user, judging whether the natural language question is an education field question or not through a large language model, and obtaining a judging result; When the judgment result is yes, carrying out semantic analysis on the natural language problem through the large language model to generate a structured query instruction comprising a problem type label, a core entity and an associated analysis dimension; When the problem type label of the structured query instruction is in policy association analysis, searching the structured semantic index library based on the core entity and the association analysis dimension, performing semantic similarity calculation on the search result to obtain a semantic similarity calculation result, utilizing the core entity to match nodes in a policy knowledge graph accessed in real time through an entity linking technology, generating an entity linking result, and performing cross-modal fusion processing on the semantic similarity calculation result and the entity linking result to obtain the search result set; Inputting the search result set into a search enhancement generation model, and calling a language model special for the education field through the search enhancement generation model to fuse the search result set so as to generate an analysis report containing data tracing reference information and a dynamic visual chart; Judging whether the natural language problem is an education field problem or not through a large language model to obtain a judging result, wherein the judging result comprises the following steps: Extracting text feature vectors of the natural language questions based on the large language model, capturing current session context feature vectors associated with the natural language questions, analyzing behavior feature vectors corresponding to the natural language questions in a user operation log, and calculating domain keyword matching degrees of the natural language questions in combination with a predefined education domain specific dictionary; Generating multi-modal judgment features by fusing the text feature vectors, the context feature vectors and the behavior feature vectors, and calculating semantic association scores based on the matching degree of the multi-modal judgment features and the domain keywords; The method comprises the steps of judging that a natural language question is an education field question when a semantic relevance score is larger than a first threshold value, judging that the natural language question is an irrelevant question when the semantic relevance score is smaller than a second threshold value, and guiding a user to confirm whether the natural language question is the education field question based on an interactive interface when the semantic relevance score is between the first threshold value and the second threshold value.
  2. 2. The method of claim 1, wherein the semantic parsing of the natural language questions by the large language model generates structured query instructions comprising question type tags, core entities, and associated analysis dimensions, comprising: Based on the large language model, carrying out education field entity identification on the natural language questions to obtain the core entity, carrying out association relation analysis on the natural language questions to obtain the association analysis dimension, and carrying out semantic structure identification on the natural language questions to obtain the question type label; and generating the structured query instruction based on the core entity, the association analysis dimension and the problem type label.
  3. 3. The method for interaction of educational data report contents generated based on retrieval enhancement according to claim 2, wherein based on the large language model, performing education domain entity recognition on the natural language question to obtain the core entity, performing association relation analysis on the natural language question to obtain the association analysis dimension, and performing semantic structure recognition on the natural language question to obtain the question type label, comprises: Based on the large language model, fusing the text feature vector and the current session context feature vector to obtain a fused feature vector, and based on a pre-constructed education field entity library, performing entity matching and disambiguation on the fused feature vector, and outputting the core entity; Based on the large language model, analyzing semantic dependency structures in the natural language problem, generating a preliminary association dimension, analyzing a historical interaction mode in the behavior feature vector, correcting the preliminary association dimension, and outputting the association analysis dimension; Based on the large language model, combining the matching degree of the domain keywords and the multi-mode judging features, and outputting the question type label through a preset education question type classifier.
  4. 4. A method of interacting content of educational data reports generated based on search enhancement according to claim 2 or 3, wherein when the question type label of the structured query instruction is cross-document association analysis, a pre-built structured semantic index library is searched based on the core entity and the association analysis dimension, generating a search result set comprising: Performing entity anchoring index matching based on the core entity, and positioning a target document set corresponding to the core entity in the structural semantic index library; Performing multidimensional association index expansion based on the association analysis dimension, extracting a document subset meeting a preset time window from the target document set according to the time dimension in the association analysis dimension, and matching each semantic segment consistent with a topic label in the document subset according to the topic dimension in the association analysis dimension; Based on the joint weight of the core entity and the association analysis dimension, carrying out relevance ranking on all semantic segments, and selecting semantic segments with weight values larger than a third threshold value in a ranking result to generate an initial retrieval set; And performing associated entity expansion on the initial retrieval set through a pre-constructed knowledge graph of the education field, and supplementing semantic segments corresponding to the expanded entities with the preset association relation with the core entity to the initial retrieval set to generate the retrieval result set.
  5. 5. The method for interaction of educational data report contents generated based on search enhancement according to claim 4, wherein the step of performing semantic similarity calculation on the search result to obtain a semantic similarity calculation result comprises: converting each semantic segment in the search result into a corresponding policy semantic vector through a pre-trained education policy special word vector library; Fusing the entity vector of the core entity and the dimension vector of the association analysis dimension to generate a policy query vector; calculating cosine similarity of the policy query vector and the policy semantic vector corresponding to each semantic segment respectively, and generating a basic similarity score corresponding to each semantic segment; Detecting whether each semantic segment contains a synonymous term corresponding to the association analysis dimension through an education policy synonymous conversion rule base, and correcting a basic similarity score corresponding to the semantic segment when any semantic segment detects the synonymous term to obtain a corrected similarity score corresponding to the semantic segment until the corrected similarity score corresponding to each semantic segment is obtained; and dynamically weighting the corrected similarity score corresponding to each semantic segment based on the historical attention weight contained in the behavior feature vector, and generating the semantic similarity calculation result containing all semantic segment similarity values.
  6. 6. The method for interaction of educational data report contents generated based on retrieval enhancement according to claim 5, wherein the step of generating entity link results by using the core entity to match nodes in a policy knowledge graph accessed in real time through entity link technology comprises: Performing multi-level node matching in the policy knowledge graph based on the entity type and attribute characteristics of the core entity; When the node confidence coefficient is matched with a plurality of candidate nodes, calculating the node confidence coefficient of each candidate node by combining the policy timeliness constraint in the association analysis dimension and the historical attention preference of the behavior feature vector; Sorting the candidate nodes according to the node confidence, screening candidate nodes with the confidence value larger than a fourth threshold value in the sorting result as target policy nodes, and generating a node relation topological graph, wherein the node relation topological graph comprises each target policy node and a policy derivative node which is associated in a preset time interval correspondingly; And extracting a policy effectiveness identifier and a policy influence factor corresponding to each target policy node based on the structured attribute data stored in the nodes in the policy knowledge graph, and generating a structured entity link result.
  7. 7. The method for interaction of educational data report contents generated based on retrieval enhancement according to claim 6, wherein performing cross-modal fusion processing on the semantic similarity calculation result and the entity link result to obtain the retrieval result set comprises: Performing a time-dependent correction on similarity values in the semantic similarity calculation result based on the policy effectiveness identification in the entity link result; constructing a policy entity aggregation network according to the node relation topological graph of the entity link result, mapping a target policy node and a policy derivative node in the entity link result into network nodes through the policy entity aggregation network, and mapping semantic segments in the semantic similarity calculation result into associated text nodes; Calculating the association strength value of the policy node and the text node in the policy entity aggregation network through a predefined multi-mode alignment matrix, wherein the association strength value fuses the weight of the policy influence factor and the corrected similarity value; And screening text nodes with association strength values larger than a fifth threshold value to generate a policy association text set, and combining the policy association text set with structured attribute data in the entity link result to generate the retrieval result set.
  8. 8. The method for educational data report content interaction generated based on retrieval enhancement of claim 7, wherein said educational domain-specific language model is used for: Generating the data tracing reference information according to the extracted policy version number and document identifier in the search result set; mapping the policy influence factors and time dimension data in the search result set to a predefined education management index coordinate system to generate the dynamic visualization chart; Adjusting the expression strength of the advice in combination with the historical adoption rate of the behavior feature vector to generate a decision advice; integrating the policy context description, the data association analysis and the decision suggestion to generate the analysis report.
  9. 9. The educational data report content interaction system based on retrieval enhancement generation is characterized by comprising a problem judging module, a semantic analysis module, a result generating module and a content interaction module; The problem judging module is used for receiving a natural language problem input by a user, judging whether the natural language problem is an education field problem or not through a large language model, and obtaining a judging result; When the judgment result is yes, carrying out semantic analysis on the natural language problem through the large language model to generate a structured query instruction comprising a problem type label, a core entity and an associated analysis dimension; The result generation module is used for searching a pre-constructed structural semantic index library based on the core entity and the association analysis dimension to generate a search result set when the problem type label of the structural query instruction is cross-document association analysis, searching the structural semantic index library based on the core entity and the association analysis dimension when the problem type label of the structural query instruction is policy association analysis, carrying out semantic similarity calculation on the search result to obtain a semantic similarity calculation result, utilizing the core entity to match nodes in a policy knowledge graph accessed in real time through an entity linking technology to generate an entity linking result, and carrying out cross-modal fusion processing on the semantic similarity calculation result and the entity linking result to obtain the search result set; The content interaction module is used for inputting the search result set into a search enhancement generation model, calling a language model special for the education field through the search enhancement generation model to fuse the search result set, and generating an analysis report containing data tracing reference information and a dynamic visual chart; the problem judging module is specifically used for: Extracting text feature vectors of the natural language questions based on the large language model, capturing current session context feature vectors associated with the natural language questions, analyzing behavior feature vectors corresponding to the natural language questions in a user operation log, and calculating domain keyword matching degrees of the natural language questions in combination with a predefined education domain specific dictionary; Generating multi-modal judgment features by fusing the text feature vectors, the context feature vectors and the behavior feature vectors, and calculating semantic association scores based on the matching degree of the multi-modal judgment features and the domain keywords; The method comprises the steps of judging that a natural language question is an education field question when a semantic relevance score is larger than a first threshold value, judging that the natural language question is an irrelevant question when the semantic relevance score is smaller than a second threshold value, and guiding a user to confirm whether the natural language question is the education field question based on an interactive interface when the semantic relevance score is between the first threshold value and the second threshold value.

Description

Education data report content interaction method and system based on retrieval enhancement generation Technical Field The invention relates to the technical field of artificial intelligence, in particular to an educational data report content interaction method and system based on retrieval enhancement generation. Background In the educational field data report interaction scenario, the traditional question-answering method has a plurality of limitations. In the prior art, multiple educational data reports are often stored independently in unstructured form, lack of semantic indexing, reliance on manual labor across document knowledge retrieval, resulting in inefficient knowledge multiplexing and difficulty in correlating historical analysis logic. When the traditional mode faces complex problems, for example, a user inquires about the relevance between the recent three-year graduate industry direction change trend and regional industry policy aiming at a plurality of talents of a certain college, the user needs to manually review report texts in parts, and the user needs to integrate and analyze after screening fragmented information through keywords, so that a great deal of time is consumed. For general problems related to higher education management, the existing method also lacks of structural storage of a management knowledge system, and answers are based on preset templates or scattered data, so that dynamic update cannot be carried out by combining the latest policy files with industry dynamics. Meanwhile, the traditional interaction mode only stays on the surface layer semantics for identifying the user intention, and is difficult to distinguish different types of questions, so that irrelevant questions occupy system resources, and key questions have delayed response or insufficient answer accuracy. In addition, the answer generation unfused retrieval enhancement generation technology only depends on a single data source, has limited depth and accuracy when multi-source association analysis is involved, has complicated interaction flow, requires multiple clarification requirements for users, and lacks effective guidance for unrelated questions. Accordingly, there is a need to provide a solution to the above-mentioned problems. Disclosure of Invention In order to solve the technical problems, the invention provides a method and a system for interaction of content of an educational data report generated based on retrieval enhancement. In a first aspect, the present invention provides a method for interaction of content of an educational data report generated based on retrieval enhancement, the method comprising the following technical scheme: Receiving a natural language question input by a user, judging whether the natural language question is an education field question or not through a large language model, and obtaining a judging result; When the judgment result is yes, carrying out semantic analysis on the natural language problem through the large language model to generate a structured query instruction comprising a problem type label, a core entity and an associated analysis dimension; When the problem type label of the structured query instruction is in policy association analysis, searching the structured semantic index library based on the core entity and the association analysis dimension, performing semantic similarity calculation on the search result to obtain a semantic similarity calculation result, utilizing the core entity to match nodes in a policy knowledge graph accessed in real time through an entity linking technology, generating an entity linking result, and performing cross-modal fusion processing on the semantic similarity calculation result and the entity linking result to obtain the search result set; Inputting the search result set into a search enhancement generation model, and calling a language model special for the education field through the search enhancement generation model to fuse the search result set so as to generate an analysis report containing data tracing reference information and a dynamic visual chart. The method for interacting the content of the educational data report generated based on the retrieval enhancement has the following beneficial effects: The method can solve the problems that the intention recognition is inaccurate, the cross-document analysis efficiency is low, the latest policy and other industry pain points cannot be dynamically combined in the traditional interaction mode, remarkably improves the efficiency and quality of data report interaction in the education field, and optimizes the user interaction experience. In a second aspect, the present invention provides an educational data report content interaction system based on retrieval enhancement generation, the technical scheme of the system is as follows: the system comprises a problem judging module, a semantic analysis module, a result generating module and a content interaction m