CN-122019699-A - Large language model illusion detection method, system, terminal and medium

CN122019699ACN 122019699 ACN122019699 ACN 122019699ACN-122019699-A

Abstract

The invention belongs to the technical field of large language models, and particularly discloses a large language model illusion detection method, a system, a terminal and a medium. The method comprises the steps of obtaining intermediate hidden states of various token in a process of generating answers by a large language model, organizing the hidden states according to a generating sequence to form a hidden state sequence, performing time sequence preprocessing on the hidden state sequence to complete preprocessing of the sequence, inputting the preprocessed hidden state sequence and dynamic characteristics composed of differential vectors, trend drift degrees or local fluctuation intensities into a dynamic sequence modeling network, enabling the network to extract time-dependent characteristics reflecting internal state change modes based on the evolution relation of the hidden states along with the generating process, generating illusion risk scores according to the comprehensive time sequence characteristics output by the dynamic sequence modeling network, and determining that the answers contain illusion contents when the scores exceed a preset threshold value. The invention can improve the reliability of the output content of the large language model.

Inventors

ZHANG SONGHAO
LI XUE
ZHANG LEI

Assignees

山东浪潮科学研究院有限公司

Dates

Publication Date: 20260512
Application Date: 20251211

Claims (10)

1. A method for detecting illusion of a large language model, comprising the steps of: In the process of generating answers by a large language model, acquiring intermediate hidden states corresponding to the token according to the generation sequence of the token, and organizing the hidden states into a hidden state sequence according to the time sequence; Performing a time-sequential preprocessing on the hidden state sequence; Inputting the preprocessed hidden state sequence into a dynamic sequence modeling network, and extracting time sequence characteristics for representing the internal state change of the model based on the evolution relation of the hidden state along with the generation process in the dynamic sequence modeling network; and carrying out illusion judgment on the answers generated by the large language model according to the time sequence characteristics, wherein the illusion judgment determines whether the answers contain illusion content or not based on the overall dynamic change mode of the hidden state sequence.
2. The large language model illusion detection method of claim 1, wherein the timing preprocessing includes: Sliding window division is carried out on the hidden state sequences according to the preset window length, and truncation, filling and normalization processing are carried out on the hidden state sequences in each window in the length dimension; Dynamically stabilizing the hidden state sequence based on the change rate between adjacent hidden states in the window; a preprocessed sequence of hidden states is obtained.
3. The large language model illusion detection method of claim 1, further comprising, prior to inputting the preprocessed hidden state sequence into the dynamic sequence modeling network: calculating a change rate sequence of the hidden state sequence based on the differential vectors of the hidden states of the adjacent time steps; calculating trend drift or local fluctuation intensity of the hidden state sequence based on a preset local window; And taking the change rate sequence and the trend drift degree or the local fluctuation strength as additional dynamic characteristics, and inputting the change rate sequence and the trend drift degree or the local fluctuation strength and the preprocessed hidden state sequence into a dynamic sequence modeling network.
4. A large language model illusion detection method according to any of claims 1-3, characterized in that the dynamic sequence modeling network comprises a time-sequence neural network structure for characterizing the evolution law of the hidden state sequence with the generation process; the time sequence neural network structure is at least one of a multi-layer long-short-term memory network, a gating circulation unit network or a time sequence transducer network based on an attention mechanism; the time sequence neural network structure carries out associated modeling on the change rate, the trend drift degree or the local fluctuation characteristics in the hidden state sequence through the recursion transmission of time steps or the dynamic distribution of attention weight along with time, and extracts the time dependent characteristics which can reflect the change mode of the internal state of the large language model before and after the generation of the illusion.
5. The large language model illusion detection method of claim 4, wherein the output of the dynamic sequence modeling network includes a first time dependent feature reflecting a long-term dependency of the hidden state sequence and a second time dependent feature reflecting a local variation pattern; The first time-dependent feature is obtained by modeling a final hidden state of the network end layer of the dynamic sequence or a sequence compression result of hidden states of each time step; The second time-dependent feature is obtained based on a convolution operation or local attention distribution within a local window of differences between adjacent time-step hidden states; the first time-dependent feature is fused with the second time-dependent feature to form a composite timing feature for the illusion decision.
6. The large language model illusion detection method of claim 5, wherein making illusion decisions based on timing characteristics includes: inputting the comprehensive time sequence characteristics into a pre-trained dynamic sequence modeling network to obtain a hallucination risk score for representing whether the answer has hallucination or not; the phantom risk score is compared to a preset risk threshold and when the phantom risk score exceeds the risk threshold, it is determined that the answer generated by the large language model contains phantom content.
7. The large language model illusion detection method according to claim 6, wherein the pre-trained dynamic sequence modeling network learns the correspondence between the time evolution mode of the hidden state in the generation process and the occurrence of the illusion by performing supervised training on the hidden state sequence sample with the illusion mark; the hidden state sequence sample comprises a label which is marked by manpower and is "in the presence of illusion" or "in the absence of illusion"; The dynamic sequence modeling network models the dynamic change relation of the hidden state sequence based on a cyclic neural network or a time sequence network structure based on an attention mechanism in the training process.
8. A large language model illusion detection system for implementing the large language model illusion detection method according to claim 1, characterized in that the system comprises: a hidden state acquisition unit configured to acquire intermediate hidden states corresponding to the token according to the generation sequence of the token in the process of generating the answer by the large language model, and organize the hidden states into a hidden state sequence according to the time sequence; The time sequence preprocessing unit is configured to perform time sequence preprocessing on the hidden state sequences, and comprises the steps of performing sliding window division on the hidden state sequences according to preset window lengths, performing truncation, filling and normalization processing on the hidden state sequences in each window in the length dimension, and performing dynamic stabilization processing on the hidden state sequences based on the change rate between adjacent hidden states in the window to obtain preprocessed hidden state sequences; The dynamic characteristic construction unit is configured to calculate a change rate sequence based on a differential vector of adjacent time step hidden states, calculate trend drift degree or local fluctuation intensity based on a preset local window, and output the change rate sequence and the trend drift degree or the local fluctuation intensity as additional dynamic characteristics together with the preprocessed hidden state sequence; The dynamic sequence modeling unit is configured to perform associated modeling on the change rate, the trend drift degree or the local fluctuation characteristic of the hidden state sequence by adopting a multi-layer long-short-term memory network, a gating circulation unit network or a time sequence transducer network based on an attention mechanism, and extract time dependent characteristics capable of reflecting the change mode of the internal state of the large language model before and after the generation of illusion; the comprehensive feature fusion unit is configured to respectively obtain a first time-dependent feature reflecting the long-term dependence of the hidden state sequence and a second time-dependent feature reflecting the local change mode from the output of the dynamic sequence modeling unit, and fuse the first time-dependent feature and the second time-dependent feature to form a comprehensive time sequence feature for illusion judgment; and a hallucination decision unit configured to input the integrated timing features into a pre-trained dynamic sequence modeling network to obtain a hallucination risk score, compare the hallucination risk score with a preset risk threshold, and determine that the answer contains hallucination content when the hallucination risk score exceeds the threshold.
9. A terminal, comprising: a memory for storing a large language model illusion detection program; a processor for implementing the steps of the large language model illusion detection method according to any one of claims 1-7 when executing the large language model illusion detection means.
10. A computer readable storage medium storing computer instructions which, when read by a computer in the storage medium, perform the large language model illusion detection method according to any one of claims 1 to 7.

Description

Large language model illusion detection method, system, terminal and medium Technical Field The invention belongs to the technical field of large language models, and particularly relates to a large language model illusion detection method, a system, a terminal and a medium. Background With the wide deployment of large language models in question and answer, content generation, auxiliary decision making and other scenes, the credibility problem of the generated results is attracting more attention. Because of the large scale of parameters of large language models, the complex internal modeling mechanisms, they tend to generate content that is inconsistent with facts when dealing with knowledge-intensive problems, so-called "hallucinations". Existing illusion detection schemes can be broadly divided into two categories: The method is a comparison and verification method based on an external knowledge base, and consistency judgment is carried out on a model generation result by retrieving the knowledge base or an external document. However, the method relies on external knowledge sources, the system deployment cost is high, the problems of large delay and insufficient coverage range exist in a real-time interaction scene, and the other method is a post-processing method based on the output content of the model, such as analysis of the logic structure, consistency or confidence of the generated text. Although the method does not depend on an external knowledge base, the method can only identify dominant text anomalies, lacks sensitivity to potential errors caused by an internal generation mechanism of a large language model, and is difficult to accurately cope with the diversity manifestation of illusions. Some research ideas have also emerged in recent years that attempt to exploit the mid-layer activation features of large language models for illusion recognition. For example, there are schemes that perform static classification only on hidden states corresponding to the final token to determine whether or not the generated content has an illusion. However, the illusion of a large language model often comes from gradual shift or abnormal evolution of an internal state in the generation process, and static analysis of only a hidden state at a single moment is difficult to comprehensively reflect the internal dynamic change rule of the model. Most of the existing detection methods are based on static features or final generated results, and the illusion recognition accuracy is low. Disclosure of Invention Aiming at the problems in the prior art, the invention provides a large language model illusion detection method, a system, a terminal and a medium, which are used for solving the problems that in the prior art in the background art, illusion of a large language model is often derived from gradual deviation or abnormal evolution of an internal state in a generation process, the internal dynamic change rule of the model is difficult to comprehensively reflect only by carrying out static analysis on a hidden state at a single moment, and the illusion recognition accuracy is low based on static characteristics or final generation results. The technical scheme adopted by the invention is as follows: in a first aspect, the present application provides a method for large language model illusion detection, the method comprising the steps of: In the process of generating answers by a large language model, acquiring intermediate hidden states corresponding to the token according to the generation sequence of the token, and organizing the hidden states into a hidden state sequence according to the time sequence; Performing a time-sequential preprocessing on the hidden state sequence; Inputting the preprocessed hidden state sequence into a dynamic sequence modeling network, and extracting time sequence characteristics for representing the internal state change of the model based on the evolution relation of the hidden state along with the generation process in the dynamic sequence modeling network; and carrying out illusion judgment on the answers generated by the large language model according to the time sequence characteristics, wherein the illusion judgment determines whether the answers contain illusion content or not based on the overall dynamic change mode of the hidden state sequence. Further, the timing preprocessing includes: Sliding window division is carried out on the hidden state sequences according to the preset window length, and truncation, filling and normalization processing are carried out on the hidden state sequences in each window in the length dimension; Dynamically stabilizing the hidden state sequence based on the change rate between adjacent hidden states in the window; a preprocessed sequence of hidden states is obtained. Further, before inputting the preprocessed hidden state sequence into the dynamic sequence modeling network, the method further comprises: calculating a change rate sequence