CN-122021932-A - Open knowledge question-answering method and system based on generation type artificial intelligence

CN122021932ACN 122021932 ACN122021932 ACN 122021932ACN-122021932-A

Abstract

The invention provides an open knowledge question-answering method and system based on a generation type artificial intelligence, and relates to the technical field of electric digital data processing. The method comprises the steps of S1, loading structured education metadata according to a teaching material version selected by a user, S2, if the teaching material version does not belong to a main stream teaching material version set, dynamically synthesizing inference weights based on teaching material catalog vectors by utilizing LoRA routers, injecting the inference weights into a large language model, otherwise, directly loading preset LoRA weights corresponding to the main stream teaching material version, injecting the preset LoRA weights into the large language model as the inference weights, S3, generating initial answers input by the user based on retrieved knowledge segments by utilizing the large language model injected with the inference weights, carrying out safety verification on the initial answers, and outputting final answers according to safety verification results. The technology solves the problem of low rationality of generated content caused by the adaptation misalignment of teaching materials in open knowledge question answering.

Inventors

XU JIANQI

Assignees

北京易教蓝天科技发展有限公司

Dates

Publication Date: 20260512
Application Date: 20260318

Claims (10)

1. The open knowledge question-answering method based on the generated artificial intelligence is characterized by comprising the following steps: S1, acquiring an input original user problem, and loading structured education metadata according to a teaching material version selected by a user, wherein the education metadata comprises a concept list forbidden to be introduced, a cognitive difficulty mapping table and a primitive white list, the concept list forbidden to be introduced comprises concept keywords forbidden to be introduced, a suitable teaching material identifier and a suitable grade range, and the cognitive difficulty mapping table comprises a predefined corresponding relation between the teaching material version, grade and cognitive difficulty grade; S2, determining a model adaptation strategy according to the version of the teaching material, if the version of the teaching material does not belong to the version set of the main stream teaching material, dynamically synthesizing an inference weight of the large language model based on the directory vector of the teaching material by utilizing a LoRA router, injecting the inference weight into the large language model, if the version of the teaching material belongs to the version set of the main stream teaching material, directly loading a preset LoRA weight corresponding to the version of the main stream teaching material, and injecting the preset LoRA weight as the inference weight into the large language model; s3, generating an initial answer input by a user based on the retrieved knowledge segments by using a large language model with injected inference weight, carrying out safety check on the initial answer, and outputting a final answer according to a safety check result, wherein the safety check comprises adopting AC automaton scanning to prohibit the introduction of a concept list, and carrying out white list check on drawing instructions contained in the initial answer.
2. The open-ended knowledge question-answering method based on generative artificial intelligence according to claim 1, wherein the utilizing LoRA router to dynamically synthesize inference weights of a large language model based on teaching material catalog vectors comprises: Extracting and encoding the characteristics of the directory text of the teaching material version which does not belong to the main stream teaching material version set, and obtaining a directory description vector; inputting the catalog description vector to a pre-trained LoRA router, and calculating the similarity between the catalog description vector of the teaching material version input by the user and the corresponding target description vector in the preset main stream teaching material version set; weighting and summing weights in LoRA weight sets corresponding to the main stream teaching material version sets according to the obtained similarity to generate a synthesized weight; The synthetic weights are injected into the attention layer of the large language model as inference weights for adjusting the output distribution of the large language model.
3. The open-ended knowledge question-answering method based on generated artificial intelligence according to claim 1, wherein the generating initial answers to user inputs based on the retrieved knowledge segments further comprises: splicing the original user problem with the course standard identifier to generate a search query vector; Semantic retrieval is carried out in a teaching material knowledge base and an external knowledge base, and candidate knowledge segments are obtained; analyzing the candidate knowledge segments, judging whether to discard the corresponding candidate knowledge segments based on the characteristics of the candidate knowledge segments obtained by analysis, and counting the number of the effective knowledge segments; if the number of the effective knowledge segments is smaller than the preset effective knowledge segment number limit value, judging that the retrieval is deadlock, triggering a retrieval retry flow, and if not, continuing to generate a final answer.
4. The open knowledge question-answering method based on the generated artificial intelligence according to claim 3, wherein the triggering search retry process specifically comprises: calling a large language model to rewrite the original user problem to obtain a user problem matched with the current difficulty level; Re-performing semantic retrieval based on the user problem matched with the current difficulty level, and judging whether to discard the corresponding candidate knowledge segments based on the characteristics of the candidate knowledge segments obtained by analysis; if the search deadlock state is not released, entering a dependency deletion degradation mode, otherwise, continuing to generate a final answer; The dependency deletion degradation mode specifically comprises the steps of calling a large language model to generate a final answer based on a pre-trained universal knowledge base, and generating prompt information for prompting that the final answer is derived from the universal knowledge.
5. The open knowledge question-answering method based on the generated artificial intelligence according to claim 3, wherein the determining whether to discard the corresponding candidate knowledge segments is performed based on the characteristics of the candidate knowledge segments obtained by the analysis, and the specific flow is as follows: Predicting the difficulty level index of each candidate knowledge segment by using a pre-trained difficulty prediction layer in the large language model; discarding the candidate knowledge segments if the difficulty level index of each candidate knowledge segment is greater than the preset upper limit of the difficulty level index; If the difficulty level index of each candidate knowledge segment is not greater than the preset upper limit of the difficulty level index, reserving the candidate knowledge segment; Calculating cosine similarity between the original user problem and the reserved candidate knowledge segments, discarding the candidate knowledge segments if the cosine similarity of each candidate knowledge segment is smaller than a preset similarity threshold, and reserving the candidate knowledge segments if the cosine similarity is not smaller than the preset similarity threshold; and marking the finally reserved candidate knowledge segments as valid knowledge segments.
6. The open knowledge question-answering method based on the generated artificial intelligence according to claim 5, wherein the difficulty level index of each candidate knowledge segment is obtained by the following specific method: outputting probability distribution of candidate knowledge segments belonging to each preset difficulty level through a difficulty prediction layer; marking the sum of products of the serial numbers of the difficulty grades and probability distribution values corresponding to the preset difficulty grades as a difficulty grade index of the candidate knowledge segments; the difficulty level is specifically that user attribute information is obtained, and the difficulty level matched with the user attribute information is determined according to a preset cognitive difficulty mapping table.
7. The open knowledge question-answering method based on the generated artificial intelligence according to claim 1, wherein the white list verification is performed on drawing instructions contained in the initial answer, and the specific flow includes: extracting drawing instructions in the initial answer by using a regular expression, wherein the drawing instructions comprise drawing types and parameters; judging whether the drawing type and the parameters are contained in a primitive white list or not; If yes, calling a front-end drawing engine to render an image according to the drawing instruction; If not, converting the drawing instruction into corresponding text description and outputting.
8. The open knowledge question-answering method based on the generated artificial intelligence according to claim 1, wherein the method for prohibiting the introduction of the concept list by adopting the AC automaton scan comprises the following specific procedures: if the initial answer is detected to contain keywords which are forbidden to be introduced into the concept list, determining that the initial answer is illegal, triggering a regeneration mechanism, and otherwise, determining that the initial answer passes the verification; The trigger regeneration mechanism comprises controlling a large language model to regenerate an answer until verification passes or the maximum retry times are reached; If the maximum retry times are reached, the fact that the initial answer still contains keywords which are forbidden to be introduced into the concept list is detected, a preset safety prompt is directly output, and white list verification of drawing instructions contained in the initial answer is stopped.
9. The open-ended knowledge question-answering method based on generative artificial intelligence according to claim 1, wherein the outputting of the final answer according to the security check result further comprises: Monitoring abnormal events in the question-answering process, and classifying and attributing the abnormal events; If the same keyword in the concept list is prohibited from being marked as illegal in different dialogs with preset times, updating a concept-prohibited semantic cluster center vector, otherwise, recording the same in a log, and not updating the concept-prohibited semantic cluster center vector, wherein the concept-prohibited semantic cluster center vector is used for expanding the scope of semantic interception to intercept variant vocabularies with similar semantics; if the search deadlock occurs, classifying the search deadlock as a teaching material logic defect, and generating a manual check work order to push to a teacher end.
10. An open knowledge question-answering system based on generated artificial intelligence, the system comprising: The input processing module is used for acquiring input original user problems, loading structured education metadata according to a teaching material version selected by a user, wherein the education metadata comprises a concept list forbidden to be introduced, a cognitive difficulty mapping table and a primitive white list, the concept list forbidden to be introduced comprises concept keywords forbidden to be introduced, an applicable teaching material identifier and an applicable grade range, and the cognitive difficulty mapping table comprises a predefined corresponding relation between the teaching material version, grade and cognitive difficulty grade; The teaching material dynamic adaptation module is used for determining a model adaptation strategy according to a teaching material version, if the teaching material version does not belong to a main stream teaching material version set, dynamically synthesizing an inference weight of the large language model based on a teaching material catalog vector by using a LoRA router, injecting the inference weight into the large language model, if the teaching material version belongs to the main stream teaching material version set, directly loading a preset LoRA weight corresponding to the main stream teaching material version, and injecting the preset LoRA weight as the inference weight into the large language model; The answer generation and examination module is used for generating an initial answer input by a user based on the retrieved knowledge segments by using a large language model with injected inference weights, carrying out security check on the initial answer, and outputting a final answer according to a security check result, wherein the security check comprises the steps of adopting an AC automaton to scan to prohibit the introduction of a concept list and carrying out white list check on drawing instructions contained in the initial answer.

Description

Open knowledge question-answering method and system based on generation type artificial intelligence Technical Field The invention relates to the technical field of electric digital data processing, in particular to an open knowledge question-answering method and system based on generation type artificial intelligence. Background Large language models (Large Language Model, LLM) exhibit great capability in natural language understanding and generation tasks, and LLM-based knowledge question-answering systems typically incorporate retrieval enhancement generation (RETRIEVAL-Augmented Generation, RAG) techniques in educational scenarios to assist models in generating more accurate, traceable answers by retrieving relevant information from structured or unstructured knowledge bases. A question-answering method and system for structured long documents disclosed in Chinese patent publication No. CN119848223B includes such steps as S1 analyzing the documents in different formats, creating the structured metadata of the document, S2 dividing the document into multiple text segments, vectorizing each text segment, storing it in dedicated vector database, S3 creating multiple text content acquisition tools for extracting the text content of different parts in the document, designing and implementing a search tool based on vector for searching the text segment relative to user 'S inquiry in the vector database, S4 creating an Agent containing multiple text content acquisition tools and search tools, intelligently selecting text content acquisition tools or search tools for user' S questions to obtain the relative text content needed by LLM for answering the questions, S5 after obtaining relative text content, analyzing the relative text content by LLM to generate final answer. A knowledge base question-answering method and system based on intention recognition, and medium disclosed in Chinese patent application with publication number of CN120045688A, for example, includes receiving user questions, inputting the user questions into large language model trained by intention recognition for preliminary classification, determining corresponding search strategy according to classification result obtained by preliminary classification, selecting knowledge base matched with search strategy for knowledge retrieval if the search strategy is knowledge base search question-answering, inputting retrieved reference knowledge and prompt word template into large language model for answering user questions, selecting prompt word template if the search strategy is large model direct question-answering, inputting the prompt word template into large language model, and answering user questions by large language model. On this basis, in order to improve the accuracy of answers, the existing method generally adopts RAG technology according to the special requirements of the education field. The answer content is ensured to accord with discipline teaching logic by retrieving relevant knowledge segments from structured data such as a teaching material knowledge base, course standards and the like and generating an answer by combining a large language model. Meanwhile, in order to solve the knowledge expression difference of the general model on different teaching material versions, the prior art also attempts to adopt a parameter efficient fine tuning technology, such as Low-Rank Adaptation (LoRA) to load special model weights for specific teaching material versions. In addition, to generate the suitability of the answer, some knowledge question-answering systems begin to introduce developmental psychological theory, such as the peajie cognitive development stage, map student age or grade to an abstract thinking ability level, and adjust the complexity and expression of the answer language accordingly. Knowledge maps and structured course criteria data (e.g., chapters, knowledge points, pre-dependencies) are also used to construct reply logic constraints to ensure that the replies conform to teaching progress. The above technology is found to have at least the following technical problems: the existing educational question-answering system only supports mainstream teaching material versions (such as human teaching version) and is difficult to cover tens of local approval teaching materials (such as Xiang teaching version, lu teaching version and the like) widely used nationwide, so that answers obtained by users using the local approval teaching materials are inconsistent with the used teaching materials in knowledge sequence, term expression or teaching style, and learning consistency and accuracy are affected. And when the general large language model answers basic questions, knowledge points exceeding national course standards are easily introduced, or professional terms which do not appear in the current teaching progress are used, so that students are caused to be cognitively confused, and the problem that generated contents in open