CN-122020720-A - Information security detection method and device based on hybrid expert model

CN122020720ACN 122020720 ACN122020720 ACN 122020720ACN-122020720-A

Abstract

The application provides an information security detection method and device based on a mixed expert model, which are used for receiving an original output text and dialogue context information generated by an upstream generation model, splicing the original output text and the dialogue context information to generate an input sequence, inputting the input sequence into the mixed expert model trained in advance to determine a hidden state output sequence corresponding to the original output text, inputting the hidden state output sequence into the multitask output model trained in advance to determine an information security detection result corresponding to the original output text. The method and the device can ensure high detection precision, reduce the consumption of computing resources, realize integrated closed-loop processing from recognition to response by parallelly outputting rewritten text, risk labels and risk fragments, and remarkably improve the safety, usability and deployment flexibility of the artificial intelligent system.

Inventors

WANG SHAOJIE
JIN SHIDONG
WANG YIXUAN
HU YALONG
LI JINGHAO
WANG WEI
LIU SHUMENG
LIU JUNYI
WANG ZI

Assignees

中国电子信息产业集团有限公司第六研究所

Dates

Publication Date: 20260512
Application Date: 20260212

Claims (10)

1. An information security detection method based on a hybrid expert model is characterized by comprising the following steps: receiving original output text generated by an upstream generation model and dialogue context information corresponding to the original output text; splicing the original output text and the dialogue context information to generate an input sequence, and inputting the input sequence into a pre-trained mixed expert model to determine a hidden state output sequence corresponding to the original output text, wherein the hidden state output sequence is obtained according to the output of at least one target expert model in the mixed expert model; And inputting the hidden state output sequence into a pre-trained multi-task output model to determine an information security detection result corresponding to the original output text, wherein the information security detection result comprises a rewritten reply text, a risk category label and a risk text fragment, which correspond to the original output text and accord with a security policy, and the risk text fragment in the original output text.
2. The information security detection method according to claim 1, wherein the step of inputting the input sequence into a pre-trained hybrid expert model to determine a hidden state output sequence corresponding to the original output text comprises: for each hidden state vector in the input sequence, determining at least one target expert model corresponding to the hidden state vector; Inputting the hidden state vector into the determined target expert model, and respectively carrying out feature transformation on the hidden state vector by each target expert model to obtain an output result of each target expert model; According to the weight coefficient of each target expert model, carrying out weighted fusion on the output result of each target expert model to generate a fusion output representation corresponding to the hidden state vector; The hidden state output sequence is generated based on the fused output representations of all hidden state vectors.
3. The information security detection method according to claim 2, wherein determining at least one target expert model corresponding to the hidden state vector includes: calculating a matching degree score between the hidden state vector and each candidate expert model in the mixed expert model; And selecting K candidate expert models with highest matching degree scores as target expert models, wherein K is an integer which is more than or equal to 1 and less than the total expert number.
4. The method for detecting information security according to claim 1, wherein the multitasking output model includes a generating unit, a classifying unit and a positioning unit, the step of inputting the hidden state output sequence into a multitasking output model trained in advance to determine an information security detection result corresponding to the original output text includes: Inputting the hidden state output sequence into the generating unit to obtain the rewritten reply text; Inputting the hidden state output sequence into the classification unit to obtain the risk category label; And inputting the hidden state output sequence into the positioning unit to obtain the risk text segment.
5. The information security detection method according to claim 1, wherein the hybrid expert model and the multitasking output model are trained by: The method comprises the steps of obtaining a training sample, wherein the training sample comprises an output sample, a dialogue context sample corresponding to the output sample, a security task type corresponding to the output sample and an information security detection sample label corresponding to the output sample; Combining the output sample and the dialogue context sample into an input sequence sample, inputting the input sequence sample into an original mixed expert model, determining an expert model to be trained corresponding to the output sample from a plurality of original expert models in the original mixed expert model, and generating an output sequence sample corresponding to the output sample by utilizing the expert model to be trained, wherein the expert model to be trained comprises at least one original expert model corresponding to the safety task type and a general expert model with highest weight in a plurality of general expert models; Inputting the output sequence sample into an original multitasking output model, and determining an information security detection prediction label corresponding to the output sequence sample; Comparing the information safety detection sample label with the information safety detection prediction label, calculating an overall training loss function, and adjusting model parameters of the original mixed expert model and the original multi-task output model by utilizing the overall training loss function until a preset training completion condition is met, so as to obtain the trained mixed expert model and the trained multi-task output model.
6. The information security detection method according to claim 1, wherein after the information security detection result is obtained, the information security detection method further comprises: determining a risk level corresponding to the original output text according to the risk category label; if the original output text is judged to be risk-free based on the risk level, the original output text is output after color rendering; if judging that the original output text has the risk based on the risk level, performing content rewriting on the original output text according to a preset security policy, and outputting the rewritten security text; and if the original output text is judged to have high risk based on the risk level, rejecting output content and generating a reject response message.
7. An information security detection device based on a hybrid expert model, the information security detection device comprising: the text acquisition module is used for receiving an original output text generated by the upstream generation model and dialogue context information corresponding to the original output text; The system comprises an output sequence generation module, a pre-trained mixed expert model, a hidden state generation module and a display module, wherein the output sequence generation module is used for splicing the original output text and the dialogue context information to generate an input sequence, and inputting the input sequence into the pre-trained mixed expert model to determine a hidden state output sequence corresponding to the original output text, wherein the hidden state output sequence is obtained according to the output of at least one target expert model in the mixed expert model; The detection result determining module is used for inputting the hidden state output sequence into a pre-trained multi-task output model to determine an information security detection result corresponding to the original output text, wherein the information security detection result comprises a rewritten reply text, a risk category label and a risk text segment in the original output text, wherein the rewritten reply text and the risk category label are corresponding to the original output text and accord with a security policy.
8. The information security detection device according to claim 7, wherein the output sequence generation module is further configured to, when the output sequence generation module is configured to input the input sequence into a pre-trained hybrid expert model, determine a hidden state output sequence corresponding to the original output text: for each hidden state vector in the input sequence, determining at least one target expert model corresponding to the hidden state vector; Inputting the hidden state vector into the determined target expert model, and respectively carrying out feature transformation on the hidden state vector by each target expert model to obtain an output result of each target expert model; According to the weight coefficient of each target expert model, carrying out weighted fusion on the output result of each target expert model to generate a fusion output representation corresponding to the hidden state vector; The hidden state output sequence is generated based on the fused output representations of all hidden state vectors.
9. An electronic device comprising a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the electronic device is in operation, the machine-readable instructions being executable by the processor to perform the steps of the hybrid expert model based information security detection method of any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the hybrid expert model-based information security detection method according to any of claims 1 to 6.

Description

Information security detection method and device based on hybrid expert model Technical Field The application relates to the technical field of artificial intelligence information security, in particular to an information security detection method and device based on a hybrid expert model. Background With the development of artificial intelligence technology, large language models are widely applied to various dialogue systems and content generation scenes. However, these models may reveal sensitive information during output, or generate high-risk content, bringing hidden danger to information security and compliance, and therefore require the deployment of special security protection mechanisms. The conventional safety protection method mainly comprises two types, namely a single-task classifier based on rule base and regular expression matching, and the method is used for identifying structural sensitive information such as an identity card number, a telephone number and the like through a preset mode and respectively constructing independent detection models for different risk types. The method relies on manual maintenance rules, is difficult to deal with complex scenes such as implicit semantics, multi-round reasoning and information extraction, and is poor in expansibility and high in maintenance cost. The other type is to integrate all security tasks into one model for end-to-end training by adopting a unified large-scale neural network as a security model. Although the modeling is unified, the problems of inter-task interference, insufficient learning of a small sample task and forgetting of original capability caused by newly added tasks are easy to occur because all tasks share parameters. Meanwhile, the model often has excessive interception phenomenon in practical application, and content which can be normally output is misjudged as high risk and response is refused, so that user experience and service availability are affected. Disclosure of Invention In view of the above, the application aims to provide an information security detection method and device based on a hybrid expert model, which dynamically adapt risk types of different semantic features, reduce the consumption of computing resources while ensuring high detection precision, realize integrated closed-loop processing from recognition to response by parallel output of rewritten text, risk labels and risk fragments, and remarkably improve the security, availability and deployment flexibility of an artificial intelligent system. In a first aspect, an embodiment of the present application provides an information security detection method based on a hybrid expert model, where the information security detection method includes: receiving original output text generated by an upstream generation model and dialogue context information corresponding to the original output text; splicing the original output text and the dialogue context information to generate an input sequence, and inputting the input sequence into a pre-trained mixed expert model to determine a hidden state output sequence corresponding to the original output text, wherein the hidden state output sequence is obtained according to the output of at least one target expert model in the mixed expert model; And inputting the hidden state output sequence into a pre-trained multi-task output model to determine an information security detection result corresponding to the original output text, wherein the information security detection result comprises a rewritten reply text, a risk category label and a risk text fragment, which correspond to the original output text and accord with a security policy, and the risk text fragment in the original output text. Further, the step of inputting the input sequence into a pre-trained mixed expert model to determine a hidden state output sequence corresponding to the original output text includes: for each hidden state vector in the input sequence, determining at least one target expert model corresponding to the hidden state vector; Inputting the hidden state vector into the determined target expert model, and respectively carrying out feature transformation on the hidden state vector by each target expert model to obtain an output result of each target expert model; According to the weight coefficient of each target expert model, carrying out weighted fusion on the output result of each target expert model to generate a fusion output representation corresponding to the hidden state vector; The hidden state output sequence is generated based on the fused output representations of all hidden state vectors. Further, the determining at least one target expert model corresponding to the hidden state vector includes: calculating a matching degree score between the hidden state vector and each candidate expert model in the mixed expert model; And selecting K candidate expert models with highest matching degree scores as target expert models, wherein