CN-121980430-A - Multi-label classification method and device for aligning cooperative attention with prototype

CN121980430ACN 121980430 ACN121980430 ACN 121980430ACN-121980430-A

Abstract

The invention discloses a multi-label classification method and device for aligning cooperative attention and a prototype, and belongs to the technical field of natural language processing. The method comprises the steps of obtaining word element level context representation by utilizing a pre-training semantic encoder, generating a filtering mask based on maximum correlation scores of word elements and a tag space by tag embedding calculation, generating a sparse sequence and suppressing redundant noise, then constructing tag attention branches and sentence level hierarchical self-attention branches in parallel, realizing cross-branch fine granularity semantic alignment by bidirectional collaborative attention, completing feature fusion by adopting self-adaptive weighting, introducing a tag prototype door control fusion and momentum on-line updating mechanism on fusion representation, continuously guiding document representation to gather towards related tags by taking a learnable tag prototype as a semantic center, and relieving long tail distribution and semantic drift. The method and the device can enhance the robustness of low-frequency label prediction while improving the classification precision and the sorting quality, and reduce unnecessary attention calculation cost.

Inventors

LU WENTAO
ZHANG HUAXIONG
HU JIE
JIN YAO

Assignees

浙江理工大学

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1.A method of multi-label classification with cooperative attention aligned with a prototype, comprising the steps of: (1) The text coding and the label guiding screening are carried out, namely, the text is coded to obtain the context semantic representation at the word level, the label embedding is used as a semantic probe, the correlation between the word and the label is calculated, the word is screened, and a filtering mask is generated, so that the sparse word representation is obtained, and the follow-up attention is restricted to be focused on the reserved word; (2) The two-branch attention modeling comprises the steps of parallelly calculating a tag attention branch and a sentence-level hierarchical self-attention branch on the sparse word element representation to respectively obtain a tag related characteristic representation based on tag guidance and a sentence-level structural characteristic representation based on hierarchical self-attention; (3) Performing bidirectional collaborative attention between two branches to realize cross-view information flow and semantic alignment, and performing adaptive weighted fusion on collaborative features of the two branches through a learnable fusion weight; (4) Maintaining a label prototype set, inquiring a prototype by using a document abstract to obtain a label semantic abstract, and fusing the document abstract and the label semantic abstract through a gating network to form a final document representation; (5) And (3) classifying and outputting, namely mapping the final document representation to a label space, obtaining the prediction score of each label and outputting a multi-label classification result.
2. The method of claim 1, wherein in step (1), a semantic relevance score of each word element and each label is calculated by using a label embedding matrix as a semantic probe, and the relevance of the same word element on all labels is maximized as a word element importance score.
3. The method according to claim 1, wherein step (1) adaptively determines a reserved word number according to the effective word indication, and selects K words with highest scores from the effective words to form a reserved index set, so as to construct a binary filter mask and a sparse word representation, and the filter mask is used to restrict the subsequent attention calculation to only reserved words meeting the requirement.
4. The method of claim 1, wherein the tag attention branches in step (2) calculate tag-to-token attention distributions based on the sparse token representation and the tag embedding matrix, and generate tag-specific document representations therefrom, and the tag semantics are injected back into the token layer based on the attention distributions to form tag-aware token-level representations.
5. The method of claim 1, wherein the sentence-level hierarchical self-attention branches in step (2) are in a hierarchical structure of intra-sentence coding-inter-sentence aggregation, wherein sentence boundaries are determined for the document, self-attention is applied to the word sequences in the sentences to capture local dependencies, and self-attention is applied to the sentence sequences in the sentences to model global dependencies, thereby obtaining a sentence-level hierarchical representation.
6. The multi-label classification method for collaborative attention and prototype alignment according to claim 1, wherein step (3) performs bi-directional collaborative attention between two branches, cross-view information flow and semantic alignment are achieved by exchanging keys and values between label attention branches and sentence-level hierarchical self-attention branches, collaborative features of the two branches are obtained respectively, and fusion characterization is obtained by adaptively weighting and fusing the collaborative features of the two branches by a learnable fusion weight.
7. The multi-tag classification method with collaborative attention and prototype alignment according to claim 1, wherein in the step (4), fusion characterization of reserved words is converged to obtain a document semantic abstract, a tag prototype set is maintained, an initial state of the prototype set is normalized by tag embedding, the document abstract is utilized to query the prototype set to obtain a tag semantic abstract, and a final document representation is formed by fusion of gating vectors, wherein the gating vectors are adaptively output by a gating network according to the document abstract and the tag semantic abstract.
8. The method of claim 1, wherein the label prototype in step (4) employs a true label constrained momentum online update mechanism, in which a label-specific representation is constructed for each label in each training batch as a temporary statistic obtained by a weighted convergence of the label prototype-primitive attentions under the filter mask constraint on the reserved-primitive, and momentum write-back update is performed on its prototype only when the label appears in the true label indication matrix.
9. A multi-label classification apparatus with co-attention aligned with a prototype comprising a memory and one or more processors, said memory having executable code stored therein, wherein said processors when executing said executable code implement a multi-label classification method with co-attention aligned with a prototype according to any one of claims 1-8.
10. A computer readable storage medium having a program stored thereon, wherein the program when executed by a processor implements a collaborative attention-to-prototype-alignment multi-label classification method in accordance with any one of claims 1-8.

Description

Multi-label classification method and device for aligning cooperative attention with prototype Technical Field The invention belongs to the technical field of natural language processing and text intelligent classification, and particularly relates to a multi-label classification method and device with cooperative attention and prototype alignment. Background The classification by Multi-labeled text (Multi-Label Text Classification, MLTC) is one of the core tasks in the field of Natural Language Processing (NLP). In MLTC, each document can be associated with multiple tags at the same time, which makes it more complex than single tag text classification, while fitting more practical application requirements. The MLTC technology is widely applied to the fields of news classification, journal classification, social media analysis and the like, and helps to realize information automatic management, knowledge mining and content understanding. Compared with the traditional single-label classification task, the MLTC is required to solve the high-dimensional characteristic of the text and also consider the dependency relationship among labels, which greatly increases the complexity of the problem. Despite the significant progress made in the MLTC task by the various approaches in recent years, some key challenges remain: First, the problem of feature dilution is particularly pronounced. A large number of redundant word elements irrelevant to classification tasks in the text dilute the expression of effective features, and the difficulty of model learning is increased. To cope with this problem, literature [Sun G, Cheng Y, Dong F, et al. Multi-label text classification model integrating label attention and historical attention[J]. Knowledge-Based Systems, 2024, 296: 111878.] uses the attention mechanism to enhance the extraction of text information related to the tag, so as to reduce the influence of redundant information, but this type of method is based on soft weighted aggregation, redundant word elements are not explicitly removed in the calculation path, and still participate in attention normalization and feature competition, so that the attention capacity is shared by noise, and thus, the problem that key evidence is diluted still exists in long text or heavy noise scenes. On the other hand, literature [Hannachi S, Najar F, Ennajari H, et al. Online short text clustering using infinite extensions of discrete mixture models[J]. Computational Intelligence, 2023, 39(5): 759-782.] proposes a folding Gibbs sampling algorithm of a generalized Dirichlet polynomial mixed model, and the sparsity problem in a short text is effectively relieved by introducing flexible Bayesian prior, but the long text redundancy is not directly subjected to word level suppression. Second, coarse-grained semantic alignment issues are prevalent. Although document [Zhu Z, Zhou P, Li Z, et al. Multi-label text classification with label attention aware and correlation aware contrastive learning[C]//Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. 2025: 8420-8428.] proposes that the UCLAF framework combines tag attention mechanisms with contrast learning, effectively capturing dependencies and partial overlap between tags, thereby improving alignment of tags and text. However, the method still relies on macroscopic representation at the document level, and it is difficult to accurately capture fine-grained information driving prediction of specific tags. Finally, long tail tag problems remain a major challenge in MLTC research. Existing models tend to favor high frequency tags due to the class imbalance of tags, ignoring predictions of low frequency tags. In order to alleviate this problem, the lightweight LightXML model proposed in literature [Jiang T, Wang D, Sun L, et al. Lightxml: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification[C]//Proceedings of the AAAI conference on artificial intelligence. 2021, 35(9): 7987-7994.] is combined with a transducer architecture and a dynamic negative sampling strategy, so that the prediction effect of long tail labels is effectively improved. Document [Dahiya K, Ortego D, Jiménez-Cabello D. Prototypical extreme multi-label classification with a dynamic margin loss[C]//Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025: 10709-10727.] proposes PRIME method, which enhances semantic alignment between labels through label prototype network and dynamic margin loss, and can effectively reduce semantic inconsistency between labels when processing long tail labels. While these studies help solve long-tailed problems, most approaches focus on extraction and static optimization of tag features, lacking continued attention for dynamic evolution. From this, it can be seen that although a great