Search

CN-121743502-B - Text classification method and system

CN121743502BCN 121743502 BCN121743502 BCN 121743502BCN-121743502-B

Abstract

The invention provides a text classification method and a text classification system, and belongs to the field of text classification processing. The method comprises the steps of preprocessing an input text, applying disturbance to the preprocessed text to generate a clean text and a disturbance text, constructing a training method of a double-view text classification model comprising an online network and a target network, and training the text classification model by utilizing the clean text and the disturbance text, wherein a dynamic low-rank attention mechanism and a double-view consistency loss function are introduced into the text classification model, the clean text and the disturbance text are used as double-view input of the text classification model, and any input text is processed in parallel by utilizing a double-branch architecture in the training classification model process. The invention solves the problem of unstable text classification method in high noise environment.

Inventors

  • ZHOU ZESHENG
  • LI PING
  • HU DONG

Assignees

  • 西南石油大学

Dates

Publication Date
20260512
Application Date
20260225

Claims (6)

  1. 1. A method of classifying text, comprising the steps of: s1, preprocessing an input text, and applying disturbance to the preprocessed text to generate a clean text and a disturbance text; S2, constructing a text classification method based on a dynamic low-rank attention mechanism, a text classification model training method comprising an online network and a target network, and training a text classification model by using a clean text and a disturbance text, wherein the dynamic low-rank attention mechanism and a double-view consistency loss function are introduced into the text classification model; encoding a dual view using a dynamic low rank attention mechanism, comprising the steps of: Embedding and representing the double views, wherein the quantity and the variability of semantic features activated by inputting double-view texts are reflected by utilizing semantic information density; based on the embedded representation result, for each attention head, carrying out low-rank projection on a query matrix Q, a key matrix K and a value matrix V which are input by the current layer by utilizing an effective rank value determined by a dynamic low-rank attention mechanism; based on the low-rank projection result, splicing and linearly transforming the output of each attention head; after splicing and linear transformation, nonlinear transformation processing is carried out position by utilizing a feedforward neural network; Aggregating nonlinear transformation processing results to obtain a coding result And ; The expression of the dual view consistency loss function is as follows: ; ; ; Wherein, the A dual-view consistency loss function is represented, Representing a loss of classification, The balance parameter is represented by a value of the balance parameter, Indicating a loss of consistency and, The size of the batch is indicated and, Representing a real label of the tag, Representing online network pair samples Is used to predict the label of a (c) tag, Representing the predicted vector predicted from the perturbed view, Representing the degree of cosine similarity, A clean view representing the target network encoder represents a projection result obtained by projection; S3, in the training process of the text classification model, any input text is processed in parallel by utilizing a double-branch architecture, and a text classification result is output.
  2. 2. The text classification method of claim 1, wherein the online network and the target network each include an encoder and a projector, and wherein the online network further includes a predictor and a classifier: an encoder for encoding the dual view using a dynamic low rank attention mechanism to obtain an encoding result And The dynamic low-rank attention mechanism is embedded in a self-attention calculation module of each layer of the encoder, semantic information density is introduced into the dynamic low-rank attention mechanism, and an effective rank value can be adaptively determined based on a semantic information density measurement result; Projector for based on the encoding result The corresponding projector is utilized to obtain the projection result And based on the encoding results The corresponding projector is utilized to obtain the projection result ; A predictor for projecting the result Predicting to obtain a prediction vector ; Classifier for coding result Judging to obtain the predicted class probability distribution And finishing the classification processing of the text.
  3. 3. The text classification method of claim 2, wherein the expression of the rank value is as follows: ; ; ; Wherein, the A rank value is indicated and a rank value is indicated, Indicating a preset minimum rank that is to be set, Indicating a preset maximum rank that is to be set, Representing normalized , The variance of the hidden representation is represented, And Representing the minimum variance and the maximum variance respectively, Representing the smoothing constant of the smoothing device, Representing the variance of all the hidden vectors, The size of the batch is indicated and, The length of the sequence is indicated and, The dimensions of the hidden layer are represented, The j-th dimensional hidden representation of the ith Token representing the b-th sample, The global average value is represented as such, The representation hidden layer represents a tensor.
  4. 4. The text classification method of claim 2, wherein the attention is calculated as follows: ; ; Wherein, the Representing the final attention weight matrix, Representation of The function is activated and the function is activated, Representing the query matrix and, Representing a low-rank projection matrix, Representing the operation of matrix transposition, The scale factor is represented as such, The matrix of keys is represented and, A matrix of values is represented and, Representing a low-rank projection matrix, Representing the result of the attention calculation.
  5. 5. The text classification method of claim 1, wherein the training process of the text classification model comprises the steps of: based on the clean text and the disturbance text, constructing a clean view and a disturbance view, wherein the clean view is input to an online network, and the disturbance view is input to a target network; Encoder utilizing an on-line network Coding the clean view to obtain a coding result And encoder using target network Coding the disturbance view to obtain a coding result ; Based on the encoding result The corresponding projector is utilized to obtain the projection result And based on the encoding results The corresponding projector is utilized to obtain the projection result ; Will project the result Input to a predictor to obtain a prediction vector ; Will encode the result Inputting into classifier, calculating to obtain predicted class probability distribution ; Based on projection results Predictive category probability distribution Prediction vector Defining a dual view consistency loss function comprising a classification loss and a consistency loss; And updating the online network parameters by using a gradient descent method to minimize a double-view consistency loss function, and updating the target network parameters by using an exponential moving average mode to complete training of the text classification model, wherein the online network parameters comprise an encoder, a projector, a predictor and a classifier, and only the encoder and the classifier in the online network are reserved after training is completed.
  6. 6. A text classification system for performing the text classification method of any of claims 1-5, comprising: the first processing module is used for preprocessing an input text, and applying disturbance to the preprocessed text to generate a clean text and a disturbance text; The second processing module is used for constructing a text classification model comprising an online network and a target network, and training the text classification model by using a clean text and a disturbance text, wherein a dynamic low-rank attention mechanism and a double-view consistency loss function are introduced into the text classification model; encoding a dual view using a dynamic low rank attention mechanism, comprising the steps of: Embedding and representing the double views, wherein the quantity and the variability of semantic features activated by inputting double-view texts are reflected by utilizing semantic information density; based on the embedded representation result, for each attention head, carrying out low-rank projection on a query matrix Q, a key matrix K and a value matrix V which are input by the current layer by utilizing an effective rank value determined by a dynamic low-rank attention mechanism; based on the low-rank projection result, splicing and linearly transforming the output of each attention head; after splicing and linear transformation, nonlinear transformation processing is carried out position by utilizing a feedforward neural network; Aggregating nonlinear transformation processing results to obtain a coding result And ; The expression of the dual view consistency loss function is as follows: ; ; ; Wherein, the A dual-view consistency loss function is represented, Representing a loss of classification, The balance parameter is represented by a value of the balance parameter, Indicating a loss of consistency and, The size of the batch is indicated and, Representing a real label of the tag, Representing online network pair samples Is used to predict the label of a (c) tag, Representing the predicted vector predicted from the perturbed view, Representing the degree of cosine similarity, A clean view representing the target network encoder represents a projection result obtained by projection; And the third processing module is used for processing any input text in parallel by utilizing a double-branch architecture in the text classification model training process and outputting a text classification result.

Description

Text classification method and system Technical Field The invention belongs to the field of text classification processing, and particularly relates to a text classification method and system. Background Text classification is one of the basic tasks in natural language processing and is widely applied to the fields of public opinion analysis, spam detection, medical report screening, news topic identification, content auditing and the like. The method aims at automatically judging the category according to the input text content, and plays an important role in downstream information retrieval, content recommendation and knowledge mining. However, text data in a real environment often contains various noise disturbances such as spelling errors, OCR recognition errors, homonymic character substitution, punctuation loss, or sentence truncation. These noises can lead to inconsistent input distribution of the model during training and reasoning stages, which can significantly degrade the performance of the text classification model. How to improve the robustness of the model while maintaining the computing efficiency in a noisy environment becomes a key problem faced by current text classification systems. The pre-training language model based on the Transformer has achieved remarkable success in text classification tasks in recent years, and a self-attention mechanism of the pre-training language model can capture long-distance dependence, so that semantic representation capability is effectively improved. But the standard transducer's self-attention computational complexity is(Where n is the text length), the computational overhead is large when processing long text. To reduce computational complexity, researchers have proposed a number of efficient attention improvement methods, such as Linformer, performer, bigBird, nystr and mformer models, that balance performance and efficiency to some extent by low-rank approximation or kernel mapping. However, the above efficient transducer method mostly adopts a fixed low-rank structure, and designs the trade-off of "efficiency-performance", so that it is difficult to consider the semantic complexity of different input samples. The fixed attention rank value limits the expression capability of the model to complex texts, and particularly, the performance is easy to be drastically reduced and the robustness is insufficient under the condition of input information missing or noise amplification. In addition, research to improve model robustness has also received attention in the field of natural language processing, such as enhancing model immunity by means of countermeasure training, data augmentation, or consistency regularization. However, existing robust training strategies are generally independent of model structure improvements, and the combination with efficient attention mechanisms is still inadequate. The problem of performance degradation of the efficient transducer model in a noisy environment is not solved, and a text classification method considering both the calculation efficiency and the noise robustness is urgently needed. Disclosure of Invention Aiming at the defects in the prior art, the text classification method and the system provided by the invention solve the problem that the text classification method is unstable in a high-noise environment. In order to achieve the above purpose, the technical scheme adopted by the invention is that the text classification method comprises the following steps: s1, preprocessing an input text, and applying disturbance to the preprocessed text to generate a clean text and a disturbance text; S2, constructing a text classification model comprising an online network and a target network, and training the text classification model by using a clean text and a disturbance text, wherein a dynamic low-rank attention mechanism and a double-view consistency loss function are introduced into the text classification model; S3, in the training process of the text classification model, any input text is processed in parallel by utilizing a double-branch architecture, and a text classification result is output. Further, the online network and the target network each include an encoder and a projector, and the online network further includes a predictor and a classifier: an encoder for encoding the dual view using a dynamic low rank attention mechanism to obtain an encoding result AndThe dynamic low-rank attention mechanism is embedded in a self-attention calculation module of each layer of the encoder, semantic information density is introduced into the dynamic low-rank attention mechanism, and an effective rank value can be adaptively determined based on a semantic information density measurement result; Projector for based on the encoding result The corresponding projector is utilized to obtain the projection resultAnd based on the encoding resultsThe corresponding projector is utilized to obtain the projection result; A predi