CN-121072782-B - Semantic enhancement processing system based on large model agent RAG database

CN121072782BCN 121072782 BCN121072782 BCN 121072782BCN-121072782-B

Abstract

The invention belongs to the technical field of artificial intelligence, discloses a semantic enhancement processing system based on a large model agent RAG database, and aims to solve the balance problem of creative thinking and fact basis retrieval in the inverse fact pushing. The anti-facts hypothesis condition of the user is obtained through the semantic query analysis module, the semantic conflict strength is precisely quantified through the knowledge retrieval and conflict recognition module, and the dynamic balance proportion of creativity and facts of control of the weighing factors is constructed. Introducing a multi-layer semantic alignment network to carry out semantic reconstruction on the fact knowledge, generating an enhanced knowledge fragment set compatible with the anti-fact assumption, and constructing a reliable anti-fact reasoning path. Meanwhile, a feedback optimization mechanism is also provided, so that the improvement can be continuously learned from the user interaction, and the self-adaptive adjustment of parameters is realized. The invention improves the reasoning quality of the large model and generates high-quality content which meets the precondition of the counterfactual and keeps basic rationality.

Inventors

CHE JIAN
ZHU WEIJUN
Dang peng

Assignees

陕西航天蓝西科技开发有限公司

Dates

Publication Date: 20260512
Application Date: 20251020

Claims (8)

1. Semantic enhancement processing system based on big model agent RAG database, characterized by comprising: the semantic query analysis module is used for acquiring a semantic query request input by a user, wherein the semantic query request comprises a target reasoning task description and a counterfactual assumption condition; The knowledge retrieval and conflict identification module is used for retrieving a fact knowledge fragment set related to the target reasoning task description from the RAG database based on the semantic query request, and extracting a constraint fact subset which conflicts with the anti-fact assumption condition in the fact knowledge fragment set; The semantic reconstruction module is configured to construct a dynamic trade-off factor according to the semantic conflict strength of the constrained fact subset, and includes: calculating the semantic embedded vector included angle between each fact knowledge segment in the constraint fact subset and the inverse fact hypothesis condition as the local semantic conflict strength; Based on the local semantic conflict intensity, introducing conflict distribution entropy, and calculating the semantic conflict uncertainty of the whole constraint fact subset; Constructing a collision intensity normalization function according to the semantic collision uncertainty, mapping the semantic collision uncertainty to a [0,1] interval, and generating a collision intensity normalization coefficient; calculating a dynamic weighing factor by utilizing the conflict strength normalization coefficient and combining with a preset creative thinking priority parameter; A multi-layer semantic alignment network for performing semantic reconstruction on the fact knowledge segment set using the dynamic trade-off factor to generate an enhanced knowledge segment set compatible with a negative fact assumption condition, comprising: Extracting a semantic dependency graph of each fact knowledge fragment in the fact knowledge fragment set, and identifying key entity nodes and relationship edges in the graph; According to the semantic replacement probability, combining the semantic embedded vector of the inverse fact hypothesis condition, retrieving the replacement entity and the replacement relation with compatible semantic from the RAG database, and generating a candidate reconstruction fragment; carrying out semantic consistency test on the candidate reconstruction fragments, and removing fragments with semantic relativity lower than a preset third threshold value described by the target reasoning task; Carrying out semantic fusion on the candidate reconstruction fragments passing through the semantic consistency test to generate an enhanced knowledge fragment set compatible with the inverse facts hypothesis condition; Based on the enhanced knowledge fragment set, carrying out semantic expansion on the target reasoning task description through a multi-layer semantic alignment network to generate enhanced semantic representation containing a counterfactual reasoning path; and the reasoning result generation module is used for outputting a semantic enhancement reasoning result according to the enhancement semantic representation and the constraint of the dynamic weighing factor.
2. The large model agent RAG database based semantic enhancement processing system of claim 1, wherein retrieving a set of fact knowledge segments from the RAG database that are related to a target inference task description and extracting a constrained fact subset of the set of fact knowledge segments that conflicts with a counterfact assumption condition comprises: performing dependency syntactic analysis on the target reasoning task description, extracting a core predicate-argument pair, and generating a semantic query vector; based on the semantic query vector, performing multi-scale semantic similarity matching in an RAG database, and acquiring a fact knowledge fragment set with semantic relativity of the core predicate-argument pair exceeding a preset first threshold; logically decomposing the anti-facts assumption condition to generate an anti-facts condition predicate set, and calculating predicate conflict degrees of the anti-facts condition predicate set and each fact knowledge segment in the fact knowledge segment set; According to the predicate conflict degree, identifying fact knowledge fragments with predicate conflict degrees exceeding a preset second threshold in the fact knowledge fragment set, and marking the fact knowledge fragments as constraint fact subsets; and carrying out semantic clustering on the fact knowledge segments in the constraint fact subset, removing repeated fact segments with semantic redundancy higher than a preset redundancy threshold, and generating a simplified constraint fact subset.
3. The semantic enhancement processing system based on a large model agent RAG database according to claim 1, wherein the semantic expansion of the target inference task description through a multi-layer semantic alignment network generates an enhanced semantic representation comprising a counterfactual inference path, comprising: Constructing a multi-layer semantic alignment network, wherein the multi-layer semantic alignment network comprises a semantic embedding layer, an inference path generating layer and a semantic alignment layer; in the semantic embedding layer, mapping the target reasoning task description and the enhanced knowledge fragment set to a unified semantic space respectively to generate a task semantic embedding vector and a knowledge semantic embedding vector; in the reasoning path generation layer, generating a plurality of candidate counterfactual reasoning paths by adopting a bidirectional attention mechanism based on the task semantic embedding vector and the knowledge semantic embedding vector; Carrying out path credibility scoring on the candidate anti-facts reasoning paths, wherein the path credibility scoring is based on semantic consistency and fact basis support degree of each reasoning step in the paths; And in the semantic alignment layer, combining the dynamic weighing factors, carrying out weighted fusion on the candidate anti-fact reasoning paths, and generating an enhanced semantic representation containing the anti-fact reasoning paths.
4. The large model agent RAG database based semantic enhancement processing system of claim 2, wherein the calculating the predicate conflict degree of the anti-fact condition predicate set with each of the fact knowledge segments in the fact knowledge segment set comprises: extracting a logical negation form of each predicate in the negative fact condition predicate set to generate a negative predicate subset; analyzing predicate-argument structures of each fact knowledge segment in the fact knowledge segment set to generate a fact predicate subset; calculating the semantic overlapping degree of the negative predicate subset and the fact predicate subset to be used as an initial value of predicate conflict degree; Introducing a predicate argument pair Ji Quan, and carrying out weighted correction on the initial value of the predicate conflict degree according to the argument semantic similarity of the anti-fact condition predicate set and the fact knowledge fragment; And carrying out normalization processing on the weighted and corrected predicate conflict degrees to generate final predicate conflict degrees.
5. The semantic enhancement processing system based on a large model agent RAG database according to claim 1, wherein the performing semantic consistency check on the candidate reconstructed segments comprises: calculating the semantic embedded vector cosine similarity of the candidate reconstruction segment and the target reasoning task description, and taking the semantic embedded vector cosine similarity as a semantic relevance score; Extracting a causal relation chain of the candidate reconstruction fragment, and calculating logic jump probability of each step in the causal relation chain as a logic consistency score; weighting and fusing the semantic relativity score and the logic consistency score to generate a comprehensive consistency score; and rejecting the candidate reconstructed fragments with the comprehensive consistency score lower than a preset third threshold value, and reserving the candidate reconstructed fragments passing the inspection.
6. The large model agent RAG database based semantic enhancement processing system of claim 3, wherein said scoring the path confidence of the candidate counterfactual inference path comprises: extracting a semantic dependency subgraph of each reasoning step in the candidate anti-facts reasoning path, and calculating the semantic support degree of entity relationship edges in the subgraph; Based on the enhanced knowledge fragment set, searching supporting fact fragments related to the reasoning step, and calculating coverage rate of the supporting fact fragments to serve as a fact basis support degree; introducing a path length penalty factor, carrying out normalization penalty on the number of reasoning steps of the candidate anti-facts reasoning path, and generating a path complexity correction coefficient; and carrying out weighted combination on the semantic support degree, the fact-based support degree and the path complexity correction coefficient to generate a path reliability score of the candidate anti-fact reasoning path.
7. The large model agent RAG database based semantic enhancement processing system of claim 1, wherein the system further comprises: the feedback module is used for constructing a dynamic feedback database of the semantic enhancement reasoning result, and the dynamic feedback database is used for storing the semantic enhancement reasoning result and the corresponding user feedback label; calculating creative scores and fact accuracy scores of the semantic enhanced reasoning results based on the user feedback tags; Adjusting the generation parameters of the dynamic weighing factors according to the creative scores and the fact accuracy scores; performing cluster analysis on the semantic enhancement reasoning results in the dynamic feedback database, and identifying a high-frequency reasoning mode to generate a reasoning mode template; And feeding the reasoning mode template back to the multi-layer semantic alignment network for improving the efficiency and accuracy of subsequent semantic expansion.
8. The large model agent RAG database based semantic enhancement processing system of claim 7, wherein said adjusting the generation parameters of the dynamic trade-off factors based on the creative score and the fact accuracy score comprises: Constructing a balance factor optimization model, wherein the balance factor optimization model takes the creative score and the fact accuracy score as objective functions; Performing Pareto front analysis on the creative scores and the fact accuracy scores, and identifying an optimal solution set of score combinations; calculating a gradient update direction of a generation parameter of the dynamic trade-off factor based on the optimal solution set, wherein the generation parameter comprises a collision intensity normalization coefficient and a creative thinking priority parameter; Introducing a parameter update damping factor, and controlling the update amplitude of the generated parameter; And verifying the stability of the adjusted dynamic weighing factors, and confirming that the adjustment is completed when the score fluctuation of the continuous multiple semantic enhancement reasoning results is lower than a preset fluctuation threshold value.

Description

Semantic enhancement processing system based on large model agent RAG database Technical Field The invention relates to the technical field of artificial intelligence, in particular to a semantic enhancement processing system based on a large model agent RAG database. Background With the rapid development of Large Language Model (LLM) technology, an intelligent system based on retrieval enhancement generation (RAG) has become an important research direction in the field of artificial intelligence. The RAG technology effectively makes up the defects of a large model in knowledge timeliness and accuracy in the professional field by combining the external knowledge base retrieval with the generation model. In the current technical practice, the RAG system mainly adopts a vector database to store knowledge fragments, searches information related to user inquiry through semantic similarity, and integrates and generates answers by a large model, so that the RAG system is widely applied to scenes such as intelligent question-answering, auxiliary decision making, content creation and the like. However, when the intelligent system of RAG encounters the counterfacts assumption condition which obviously conflicts with the known facts, the system often falls into the dilemma that if the system excessively relies on the retrieved fact knowledge, the system can refuse to accept the counterfacts premise solidly, so that reasoning cannot be developed deeply, for example, in the query of 'assumption that Newton never finds universal gravitation', the system can continuously return to historical facts, meaningful assumption exploration cannot be performed, if the fact constraint is completely ignored, unreasonable inference departing from the basic science principle can be generated, for example, a conclusion against the basic biological principle is generated in the assumption of 'assumption that human can photosynthesis'. This simple processing strategy, however, makes it impossible for the system to dynamically adjust the trade-off factors according to the specific strength and type of semantic conflict, and especially in the fields of education, creative writing, scientific exploration, etc., to provide answers that are both creative and rationalized in terms of facts. The lack of an accurate quantification of the degree of conflict between the counterfacts hypothesis and the fact knowledge in the existing system, the inability to distinguish the difference between slight deviations and fundamental upsets, results in the same processing strategy being adopted at different conflict strengths, such as when processing the assumption that the "assumption appears in the internet last ten years" and the "assumption that the electricity is not conductive" are significantly different, the system presents the same confusion. In addition, in the prior art, coarse-granularity keyword matching or simple vector similarity calculation is generally adopted in the semantic conflict recognition process, complex semantic dependency relationship and implicit conflict cannot be captured, and therefore in the fields with high requirements on the fact accuracy, such as medical diagnosis assistance, legal consultation, policy analysis and the like, the system cannot guarantee the basic reliability of generated content. In view of the above, the present invention proposes a semantic enhancement processing system based on a large model agent RAG database to solve the above-mentioned problems. Disclosure of Invention In order to overcome the defects in the prior art and achieve the purposes, the invention provides the following technical scheme that the semantic enhancement processing system based on the large model agent RAG database comprises: The semantic query analysis module is used for acquiring a semantic query request input by a user, wherein the semantic query request comprises a target reasoning task description and a counterfactual assumption condition; the knowledge retrieval and conflict recognition module is used for retrieving a fact knowledge fragment set related to the target reasoning task description from the RAG database based on the semantic query request, and extracting a constraint fact subset which conflicts with the anti-fact hypothesis condition in the fact knowledge fragment set; The semantic reconstruction module is used for constructing a dynamic weighing factor according to semantic conflict intensity of the constraint fact subset, wherein the dynamic weighing factor is used for representing relative weights of creative thinking generation and fact-based retrieval; The multi-layer semantic alignment network is used for carrying out semantic reconstruction on the fact knowledge fragment set by utilizing dynamic weighing factors to generate an enhanced knowledge fragment set compatible with the anti-fact assumption condition; based on the enhanced knowledge fragment set, carrying out semantic expansion on the target re