CN-121980440-A - Malicious diversion text recognition method based on large language model and knowledge distillation

CN121980440ACN 121980440 ACN121980440 ACN 121980440ACN-121980440-A

Abstract

The invention discloses a malicious diversion text recognition method based on a large language model and knowledge distillation, which comprises the steps of firstly calculating text confusion by using a mask language model mechanism of a pre-training BERT model, screening a suspicious malicious diversion text sample set with high confusion, then inputting the sample set into the large language model, carrying out zero sample reasoning by combining a structured prompt word containing task definition, a judging rule and an example, generating a binary annotation tag through a response analysis function to form a high-quality annotation data set, finally carrying out supervision fine adjustment on a lightweight BERT sequence classification model by using the data set, and transferring recognition knowledge of the large language model to the lightweight model through knowledge distillation to obtain a deployable malicious diversion text recognition classifier. The method solves the problems that the traditional method relies on high manual labeling cost and the direct deployment efficiency of the large language model is low, and realizes the automatic, efficient and accurate identification of the malicious diversion text.

Inventors

LI SHUDONG
YAO MINGJUN
WU XIAOBO
QU JUN
HUANG HAICHENG

Assignees

广州大学

Dates

Publication Date: 20260505
Application Date: 20251218

Claims (10)

1. The malicious diversion text recognition method based on the large language model and knowledge distillation is characterized by comprising the following steps of: acquiring original unmarked text corpus, processing the corpus by adopting a pre-trained BERT language model and a matched word segmentation device, calculating the confusion degree of each text by a BERT shielding language model mechanism, and screening out a high-confusion suspicious sample set with the confusion degree larger than a preset threshold value, wherein the suspicious sample set is a sample set of suspected malicious guide texts; Inputting the obtained high-confusion suspicious sample set into a preset large language model, wherein the large language model carries out zero sample reasoning and context learning by combining with a structured prompt word, and the structured prompt word comprises an instruction set containing task definition, a judging rule and output format constraint and an example set containing typical normal texts and typical malicious guide texts; And taking the cross entropy loss function as an optimization target, minimizing the difference between the prediction result of the lightweight BERT sequence classification model and the binary labeling label, and finishing the distillation of large language model knowledge to the lightweight model to obtain a deployable malicious diversion text recognition classifier, wherein the classifier is used for recognizing and judging whether the text is a malicious diversion text.
2. The malicious diversion text recognition method based on large language model and knowledge distillation according to claim 1, wherein the confusion calculating process specifically comprises the following steps: performing word segmentation processing on an input single text sentence by using a matched word segmentation device to obtain a word element sequence, and converting the word element sequence into a corresponding input ID; traversing each target word element ID except the [ CLS ] and the [ SEP ] special marks in the input ID, copying the input ID to obtain a temporary input ID sequence, and replacing the target word element ID in the temporary input ID sequence with an ID corresponding to the [ MASK ] mark to obtain a MASK input ID sequence; inputting the mask input ID sequence into a pre-trained BERT language model, obtaining the logarithmic probability output by the model, extracting the logarithmic probability corresponding to the target word position, and obtaining the logarithmic probability value of the target word through logarithmic softmax operation; And calculating the negative log-likelihood sum of all the target words, counting the total number of the target words, dividing the negative log-likelihood sum by the total number of the target words to obtain average loss, and carrying out exponential operation on the average loss to obtain the confusion of the text sentence.
3. The malicious guide text recognition method based on large language model and knowledge distillation according to claim 2, wherein the extracting of the logarithmic probability corresponding to the target word element position is specifically to obtain a logarithmic probability matrix output by the BERT language model, wherein the dimension of the matrix is [1, the total number of word elements, the word list size ], and the extracting of the logarithmic probability value corresponding to the index of [0, the target word element position and the target word element original ID ] in the matrix is used as the logarithmic probability of the target word element.
4. The method for identifying malicious diversion text based on large language model and knowledge distillation according to claim 2, wherein the preset confusion threshold is set to 30, and when the confusion of the text is greater than 30, the text is judged to be a high-confusion suspicious sample, and the high-confusion suspicious sample set is included.
5. The malicious diversion text recognition method based on large language model and knowledge distillation according to claim 1, wherein the judging process of the binary labeling label is as follows: ; Wherein: the binary label representing the final output is represented by 0 or 1, wherein 0 represents normal and 1 represents abnormal; Representing the input text to be tested; Representing an instruction set containing task definitions, decision rules, and output format constraints; representing an exemplar set containing typical normal and abnormal text; Representing internal parameters of the large language model; Representing the large language model to generate a response text according to the spliced complete context Conditional probability of # -representing a stitching operation; Representing response from text Extracting key words from the Chinese character and mapping the key words into labels Is a function of the resolution of (a).
6. The malicious diversion text recognition method based on large language model and knowledge distillation according to claim 1, wherein the supervised fine tuning of the pre-trained lightweight BERT sequence classification model is specifically: ; Wherein: Representing the optimal parameters of the optimized BERT model; high quality fine-tuning data set representing second stage yield from normal samples And an abnormal sample Combining to obtain the product; Representing one sample in a dataset, wherein Is a text of the character of the document, Is the corresponding LLM label; Representing a lightweight BERT sequence classification model that receives text As input, and outputting a predicted probability distribution; Representing cross entropy loss function for weighting model predictions and true labels Differences between them.
7. The method for identifying malicious guide text based on large language model and knowledge distillation according to claim 1, wherein the high-quality labeling data set is composed of a combination of a normal text sample subset labeled 0 and a malicious guide text sample subset labeled 1.
8. The malicious diversion text recognition system based on the large language model and the knowledge distillation is characterized by being applied to the malicious diversion text recognition method based on the large language model and the knowledge distillation, which is disclosed in any one of claims 1-7, and comprises an unsupervised anomaly screening module, a large language model expert labeling module and a knowledge distillation and training module; The non-supervision abnormal screening module is used for acquiring original non-labeling text corpus, processing the corpus by adopting a pre-trained BERT language model and a matched word segmentation device, calculating the confusion degree of each text by a BERT shielding language model mechanism, and screening out a high-confusion suspicious sample set with the confusion degree larger than a preset threshold value, wherein the suspicious sample set is a sample set of suspected malicious diversion texts; The large language model expert annotation module is used for inputting the obtained suspicious sample set with high confusion degree into a preset large language model, the large language model carries out zero sample reasoning and context learning by combining with a structured prompt word, and the structured prompt word comprises an instruction set containing task definition, a judging rule and output format constraint and an example set containing typical normal texts and typical malicious guide texts; The knowledge distillation and training module is used for performing supervised fine adjustment on the pre-trained lightweight BERT sequence classification model by taking the obtained high-quality labeling data set as training data, minimizing the difference between the prediction result of the lightweight BERT sequence classification model and the binary labeling label by taking the cross entropy loss function as an optimization target, and finishing the distillation of large language model knowledge to the lightweight model to obtain a deployable malicious diversion text recognition classifier, wherein the classifier is used for recognizing and judging whether the text is a malicious diversion text.
9. An electronic device, the electronic device comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores computer program instructions executable by the at least one processor to enable the at least one processor to perform the large language model and knowledge distillation based malicious guide text recognition method of any one of claims 1-7.
10. A computer readable storage medium storing a program, wherein the program, when executed by a processor, implements the large language model and knowledge distillation based malicious guide text recognition method of any one of claims 1-7.

Description

Malicious diversion text recognition method based on large language model and knowledge distillation Technical Field The invention belongs to the technical field of machine learning, and particularly relates to a malicious diversion text recognition method based on a large language model and knowledge distillation. Background In the current highly interconnected network environment, texts become a core medium for information transmission and user communication, however, along with the generation of massive information, a large amount of carefully constructed antagonistic texts are mixed, and by using abnormal and evasive language modes such as variant words, special symbols, incoherent descriptions and the like, such texts are aimed at bypassing the detection mechanism of the mainstream content auditing platform to perform activities such as malicious diversion, fraud, poor information transmission and the like, so that the network environment is seriously polluted, the user experience is damaged, and direct threat is further formed to the safety ecology and business interests of the platform, so how to efficiently and accurately identify and filter such antagonistic texts becomes a very challenging core task in the field of network content management. To address this challenge, industry has developed a variety of technological paths, but current solutions still face significant bottlenecks and limitations in practical applications. The traditional supervised learning method is a mainstream technology of text classification, and has the advantages that although a model structure is relatively simple and reasoning speed is high, the performance is highly dependent on a large-scale and high-quality manual annotation data set, and under an antagonistic text detection scene, the acquisition of the data is extremely difficult, on one hand, the black ash attack means are rapidly changed, a text mode is endlessly layered, so that the annotation rule needs to be frequently updated, the labor cost is extremely high, on the other hand, the definition of a plurality of variant texts is relatively fuzzy, the professional capability requirements of annotators are severe, the data annotation period is long, and the quality is difficult to guarantee. In recent years, a pretraining model represented by a Large Language Model (LLM) makes breakthrough progress on natural language understanding tasks, and powerful zero-sample and few-sample reasoning capacity can accurately identify an antagonistic text in theory under the condition of not depending on a large amount of labeling data, but huge model parameters and high calculation cost are behind the excellent performance, and in a real-time auditing system on a line with high concurrency and low delay, the direct deployment of the large language model for reasoning is unrealistic, the reasoning speed and the resource consumption can not meet the severe requirements of a production environment, and the disjointing between 'capability' and 'application' is formed, namely, the model with the most understandable semantics can not be directly used for first-line detection due to the efficiency problem. In order to get rid of dependence on manual labeling, partial researches try to adopt an unsupervised or semi-supervised method, such as clustering, anomaly detection and other algorithms to primarily screen massive texts, but the method can only capture shallow statistical characteristics or surface anomaly modes of texts, has limited recognition capability on the subtended contrast text of a semantic layer, causes unsatisfactory accuracy and recall rate, and is difficult to directly use as a reliable classifier. In summary, the current technology for detecting the resistant text faces the dilemma that the traditional method relying on manual labeling is high in cost and lag in response, and a large language model with high-precision semantic understanding capability is difficult to directly deploy, so that the field needs an innovative framework capable of automatically generating high-quality training data with low cost and efficiently migrating the large model 'expert knowledge' into a lightweight special model suitable for deployment so as to bridge the gap of the prior art. Disclosure of Invention The invention aims to overcome the defects and shortcomings of the prior art, and provides a malicious diversion text recognition method based on a large language model and knowledge distillation, which takes a large amount of unlabeled texts as input, and finally outputs a lightweight, efficient and antagonistic text classifier which can be directly deployed in a production environment through a three-stage automatic processing pipeline. In order to achieve the above purpose, the present invention adopts the following technical scheme: in a first aspect, the invention provides a malicious diversion text recognition method based on a large language model and knowledge distillation,