Search

CN-121980043-A - Mixed expert routing method and system based on trusted RAG

CN121980043ACN 121980043 ACN121980043 ACN 121980043ACN-121980043-A

Abstract

The invention relates to the technical field of artificial intelligence and discloses a mixed expert routing method and a system based on trusted RAG, wherein the method comprises the steps of acquiring metadata of a document associated with an input text to generate a document-level trusted RAG vector; the method comprises the steps of obtaining hidden representations of each token according to the input text and text content of the associated document, obtaining a trusted vector corresponding to each token based on the document-level trusted RAG vector, splicing the hidden representations of each token with the trusted vector corresponding to each token to generate an enhanced input vector of each token, calculating expert scores by taking the enhanced input vector as input of a gating network, selecting at least one expert network for weighted fusion, and generating final output, wherein the method shows a routing decision overall process of organically integrating trusted hierarchical information of external knowledge into a hybrid expert model, and realizes crossing from 'semantic driving' to 'semantic-trusted collaborative driving', thereby remarkably improving routing accuracy.

Inventors

  • SHU RENWEI
  • XIAO JIANG
  • TAO HUAJUN
  • Fan Jiashuai
  • WANG BO
  • NI HAO
  • SUN BINQI
  • CHEN YONGHAO

Assignees

  • 上海海纳金赋水数字科技有限公司

Dates

Publication Date
20260505
Application Date
20251225

Claims (10)

  1. 1. A hybrid expert routing method based on trusted RAG, comprising the steps of: s1, acquiring metadata of a document associated with an input text to generate a document-level trusted RAG vector; s2, obtaining hidden representations of each token according to the input text and text content of the associated document, obtaining a trusted vector corresponding to each token based on the document-level trusted RAG vector, and splicing the hidden representations of each token with the trusted vector corresponding to each token to generate an enhanced input vector of each token; And S3, taking the enhanced input vector as the input of a gating network, calculating expert scores, and selecting at least one expert network for weighted fusion to generate final output.
  2. 2. The method according to claim 1, wherein in step S1, the generating a document-level trusted RAG vector specifically includes: Receiving a user query and retrieving candidate documents from multiple sources, recording metadata for each document, including source category, timestamp, credibility-related raw signals; Acquiring metadata of all documents related to input text, embedding and encoding the category metadata, linearly mapping the numerical metadata and the reliability score, and splicing the encoding result into a document-level reliable RAG vector with a fixed length , wherein, Representing the feature dimension of the trusted vector.
  3. 3. The method of trusted RAG based hybrid expert routing according to claim 2, wherein in step S2, generating an enhanced input vector for each token specifically comprises: encoding the text content of the input text and its associated document to obtain a hidden representation of each token ; Assigning document identification to each token The identifier indicates the document to which the token belongs, and the corresponding document-level trusted RAG vector is generated according to the identifier Assigned to each token to form token-level trusted vectors ; Will be And (3) with Splicing in characteristic dimension to form enhanced input vector 。
  4. 4. The method of trusted RAG based hybrid expert routing of claim 3, wherein in step S2, said generating an enhanced input vector for each token further comprises: stacking hidden representations of all token as a matrix And stacking the trusted vectors of all token as a matrix , wherein, For the length of the sequence, In order to hide the feature dimensions of the representation, Feature dimension which is a trusted vector; Matrix is formed And (3) with Splicing in the characteristic dimension to obtain an enhanced representation matrix 。
  5. 5. The trusted RAG based hybrid expert routing method of claim 1, wherein in step S3, the calculation of the expert score is specifically achieved by the following formula: Wherein, the To expand the weight matrix of the gating network, superscript The transpose of the matrix is represented, To extend the bias term of the gating network, Is the first The enhanced input vector of the token, Represent the first Score corresponding to each token.
  6. 6. The method according to claim 1, wherein in step S3, the selecting at least one expert network specifically comprises: And sequencing all expert network scores corresponding to each token, and selecting K expert networks with the highest scores as the experts activated by the current token, wherein K is a preset positive integer.
  7. 7. The method according to claim 6, wherein in step S3, the weighted fusion specifically comprises: and carrying out normalization processing on the scores of the K expert networks to obtain weight distribution, and carrying out weighted summation on the outputs of the K expert networks according to the weight distribution to obtain the final output representation of the current token.
  8. 8. A hybrid expert routing system based on a trusted RAG, comprising: The trusted RAG hierarchical coding module is used for acquiring metadata of a document associated with an input text to generate a document-level trusted RAG vector; The enhanced input construction module is used for obtaining hidden representation of each token according to the input text and text content of the associated document, obtaining a trusted vector corresponding to each token based on the document-level trusted RAG vector, and splicing the hidden representation of each token with the corresponding trusted vector to generate an enhanced input vector of each token; and the extended gating route and fusion module is used for taking the enhanced input vector as the input of a gating network, calculating expert scores and selecting at least one expert network for weighted fusion to generate final output.
  9. 9. A computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the trusted RAG based hybrid expert routing method of any of claims 1-7.
  10. 10. An electronic device comprising one or more processors and storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the trusted RAG based hybrid expert routing method of any of claims 1-7.

Description

Mixed expert routing method and system based on trusted RAG Technical Field The invention relates to the technical field of artificial intelligence, in particular to a mixed expert routing method and system based on a trusted RAG. Background The mixed expert model remarkably improves the calculation efficiency while maintaining the model performance through a sparse activation mechanism, and is characterized in that a small number of experts are dynamically selected for calculation only according to semantic information of an input text through a gating network. However, with the popularity of search enhancement generation techniques, models need to fuse external knowledge from multiple sources, heterogeneous, and of varying quality. In this realistic application scenario, the inherent drawbacks of the traditional routing mechanism are exposed in that its decision-making relies entirely on the hidden representation of text, while the key metadata information carried by the retrieved documents, such as source authority, timeliness and credibility, is scored and lacks entirely perceived capabilities. This "information dead zone" causes a series of problems. First, when documents of different trusted levels are semantically similar in the surface layer, the model cannot distinguish their quality differences at the time of routing, and experts may be called equally or erroneously, thereby affecting the accuracy and reliability of the final output. Second, because aging information cannot be perceived, the model may activate experts trained based on outdated knowledge when dealing with time-sensitive questions, generating old and even wrong answers. More importantly, the whole expert selection process becomes an opaque 'black box', so that developers and users cannot easily understand the basis of specific routing decisions, and cannot conduct necessary guidance or intervention on the knowledge fusion process according to actual requirements (such as preferentially trust a certain type of information source). Therefore, the prior art faces a key challenge of designing a novel routing method on the premise of maintaining the sparse and efficient architecture advantage of the hybrid expert model, so that the novel routing method can cooperatively utilize the deep semantics of the input text and the credible attribute of the external knowledge, and therefore more accurate, more robust and more interpretable expert selection is realized. Disclosure of Invention The present invention has been made to solve the above-mentioned drawbacks occurring in the prior art, and an object of the present invention is to provide a hybrid expert routing method and system based on trusted RAG, which can solve the above-mentioned drawbacks by encoding document source, age and credibility metadata into a credible vector, and the method is aligned and spliced with text semantic representation, drives a gating network to perform expert selection and fusion, and remarkably improves the routing accuracy and the interpretability in a multi-source knowledge fusion scene while keeping the sparsity and the high efficiency of the model. In one aspect, the present invention provides a hybrid expert routing method based on trusted RAG, comprising the steps of: s1, acquiring metadata of a document associated with an input text to generate a document-level trusted RAG vector; s2, obtaining hidden representations of each token according to the input text and text content of the associated document, obtaining a trusted vector corresponding to each token based on the document-level trusted RAG vector, and splicing the hidden representations of each token with the trusted vector corresponding to each token to generate an enhanced input vector of each token; And S3, taking the enhanced input vector as the input of a gating network, calculating expert scores, and selecting at least one expert network for weighted fusion to generate final output. Further, in step S1, the generating a document-level trusted RAG vector specifically includes: Receiving a user query and retrieving candidate documents from multiple sources, recording metadata for each document, including source category, timestamp, credibility-related raw signals; Acquiring metadata of all documents related to input text, embedding and encoding the category metadata, linearly mapping the numerical metadata and the reliability score, and splicing the encoding result into a document-level reliable RAG vector with a fixed length , wherein,Representing the feature dimension of the trusted vector. Further, in step S2, generating the enhanced input vector for each token specifically includes: encoding the text content of the input text and its associated document to obtain a hidden representation of each token ; Assigning document identification to each tokenThe identifier indicates the document to which the token belongs, and the corresponding document-level trusted RAG vector is generated accordi