CN-122021903-A - Domain knowledge-oriented large language model accurate output control system and method

CN122021903ACN 122021903 ACN122021903 ACN 122021903ACN-122021903-A

Abstract

The invention discloses a large language model accurate output control system and method oriented to domain knowledge, and belongs to the technical field of artificial intelligence natural language processing. The system comprises an encoding module, a cognitive regulation module and a decoding output module. The encoding module parses the user query to generate a semantic rich intermediate representation that contains an initial cognitive state vector as a structured data object that carries core semantics, logical constraints, and risk attributes. The cognitive control module parses the vector to quantify the professional complexity and risk and generates an output strategy that adapts to the user's cognitive level. The decoding output module generates and converts the content according to the strategy, and quantitatively evaluates indexes such as semantic fidelity, risk retention and the like by comparing cognitive state vectors before and after conversion, and automatically embeds risk prompts when the fidelity is insufficient. The invention realizes the whole-flow controllable output of the adaptive expression from professional understanding, and effectively improves the safety, accuracy and interpretability of the application in the fields with high requirements such as medical treatment, law and the like.

Inventors

CHEN ZHENYAN

Assignees

厦门尘信科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (10)

1. The large language model output control system oriented to domain knowledge is characterized by comprising an encoding module and a decoding output module, wherein the encoding module is used for generating a rich semantic intermediate representation comprising an initial cognitive state vector based on user inquiry, the initial cognitive state vector is a structured data object, fields of the initial cognitive state vector are at least used for bearing a core semantic unit, a logic constraint relation and a risk assessment label which are analyzed from the user inquiry, the cognitive control module is used for analyzing the initial cognitive state vector and generating an output control strategy based on an analysis result, and the decoding output module is used for controllably converting the rich semantic intermediate representation according to the output control strategy to generate output content and outputting a result after fidelity assessment is carried out on a conversion process based on the initial cognitive state vector.
2. The system of claim 1, wherein the encoding module comprises a domain classification unit for determining a main domain of the user query, an inference construction unit for generating an inference logic chain based on the main domain, a knowledge linking unit for verifying or complementing entities and assertions in the inference logic chain by means of entity linking and relationship inference functions of a pre-constructed domain knowledge graph, and a representation generation unit for integrating the inference logic chain processed by the knowledge linking unit to generate the semantic rich intermediate representation and the initial cognitive state vector.
3. The system of claim 1, wherein the cognitive regulation module comprises a vector parsing unit configured to quantify at least one of a specialized complexity, a logic certainty, and a risk density characterized by the initial cognitive state vector, and a strategy generation unit configured to combine the quantified result with a user cognitive level to generate the output regulation strategy.
4. The system according to claim 1, wherein the decoding output module comprises a content generation unit for generating the output content by performing term replacement, logic simplification or adding explanatory content on the semantic rich intermediate representation according to the output regulation strategy, an evaluation unit for comparing the initial cognitive state vector with a cognitive state vector derived in a conversion process to obtain a fidelity evaluation result, and a result output unit for selectively integrating a cognitive transparency prompt with the output content and outputting the integrated result based on the fidelity evaluation result.
5. The system according to claim 4, wherein the evaluation unit obtains the fidelity assessment result by at least one of calculating a core semantic fidelity determined by calculating a cosine similarity of the initial cognitive state vector and a core term set in the derived cognitive state vector in a semantic embedding space, calculating a risk attribute retention determined by counting a dominant mention frequency or semantic coverage ratio of risk assessment tags marked in the initial cognitive state vector in the output content, and calculating a logical integrity index determined by analyzing consistency of a logical implication relationship between key statements in the output content and an initial logical chain restored from the initial cognitive state vector.
6. A large language model output control method facing domain knowledge is characterized by comprising the steps of generating a rich semantic intermediate representation containing an initial cognitive state vector in response to user inquiry, wherein the initial cognitive state vector is a structured data object, fields of the initial cognitive state vector are at least used for bearing a core semantic unit, a logic constraint relation and a risk assessment label which are analyzed from the user inquiry, analyzing the initial cognitive state vector and generating an output regulation strategy based on an analysis result, and performing controllable conversion on the rich semantic intermediate representation according to the output regulation strategy to generate output content, and performing fidelity assessment on a conversion process based on the initial cognitive state vector and then outputting the result.
7. The method of claim 6, wherein the generating step includes performing domain classification on the user query, constructing an inference logic chain based on the classified domain, performing link verification and information completion on entities and assertions in the inference logic chain using a pre-constructed domain knowledge graph, and generating the semantic rich intermediate representation and the initial cognitive state vector based on the processed inference logic chain.
8. The method of claim 6, wherein the regulating step comprises quantitatively extracting at least one indicator of professional complexity, logic certainty, and risk density from the initial cognitive state vector, and combining the indicator with a user cognitive level to determine the output regulation strategy comprising terms conversion strength, risk retention requirements, and logic simplicity.
9. The method of claim 6, wherein the fidelity assessment in the outputting step includes obtaining a derived cognitive state vector during the converting, and assessing retention of a core semantic, risk attribute, or logical structure by comparing the initial cognitive state vector to the derived cognitive state vector.
10. The method according to claim 6 or 9, wherein the outputting step further comprises generating the cognitive transparency cue when the fidelity assessment result is below a preset threshold, and outputting the cognitive transparency cue together with the output content.

Description

Domain knowledge-oriented large language model accurate output control system and method Technical Field The invention relates to the technical field of artificial intelligence and natural language processing, in particular to a system and a method for enhancing the accuracy, controllability and interpretability of output results of a large language model in the vertical field (such as law, medical treatment, finance and programming) application. Background The large language model is excellent in processing general corpus, but when facing the professional fields such as law, medical treatment and the like, the inherent generation mode of the large language model causes three key challenges, namely, firstly, the field illusion is generated, namely, the content lacking in anchoring facts or violating the general knowledge of the field is generated, secondly, the cognition is opaque, namely, in the process of converting internal complex characterization into understandable output of a user, key risk information and logic chains can be simplified or lost without perception, the user cannot evaluate the credibility of the information, and thirdly, the output stiffness is difficult, and the detail degree, the term density and the warning level of the output are difficult to be dynamically adjusted according to the professional background and the risk bearing capability of the user. The existing technologies such as retrieval enhancement generation mainly inject information from an external knowledge source to relieve illusion, but do not provide a mechanism for systematically regulating and auditing the conversion process of 'from professional understanding to adaptation expression' in a model. Therefore, how to realize a traceable, quantifiable and controllable output generation flow becomes a core technical bottleneck for improving the application reliability of the large model in the key field. Disclosure of Invention The invention aims to overcome the defects in the prior art and provide a large language model accurate output control system and method facing domain knowledge. The system and the method construct a traceable and quantifiable bidirectional processing funnel framework by introducing a cognitive state vector which runs through a processing flow, so that the full-flow accurate regulation and transparency evaluation of the output content from professional internal representation to external adaptation expression is realized. In order to achieve the purpose, the technical scheme is that the large language model output control system facing the domain knowledge comprises an encoding module and a processing module, wherein the encoding module is used for generating a semantic rich intermediate representation containing an initial cognitive state vector based on user inquiry. The module specifically comprises a domain classification unit for determining a main domain of the user query, an inference construction unit for generating an inference logic chain based on the main domain, a knowledge linking unit for carrying out association verification and completion on entities or assertions in the inference logic chain and domain knowledge graphs, and a representation generation unit for integrating the associated inference logic chain to generate the rich-semantic intermediate representation and the initial cognitive state vector, wherein the initial cognitive state vector is a structured data object and is used for carrying semantic, logic and risk metadata in a standardized manner. And the cognitive regulation and control module is used for analyzing the initial cognitive state vector and generating an output regulation and control strategy based on an analysis result. The module specifically comprises a vector analysis unit and a strategy generation unit, wherein the vector analysis unit is used for quantifying at least one of professional complexity, logic certainty and risk density represented by the initial cognitive state vector, and the strategy generation unit is used for combining the quantification result and the user cognitive level to generate the output regulation strategy. And the decoding output module is used for controllably converting the semantic rich intermediate representation according to the output regulation strategy to generate output content, and outputting a result after performing fidelity evaluation on the conversion process based on the initial cognitive state vector. The module specifically comprises a content generation unit, an evaluation unit and a result output unit, wherein the content generation unit is used for generating the output content by carrying out term replacement, logic simplification or adding explanatory content on the rich semantic intermediate representation according to the output regulation strategy, the evaluation unit is used for comparing the initial cognitive state vector with the cognitive state vector derived in the conversion process to obtain