CN-121999319-A - Knowledge perception prompt and multi-prototype clustering-based few-sample tuning method

CN121999319ACN 121999319 ACN121999319 ACN 121999319ACN-121999319-A

Abstract

The application relates to the technical field of machine learning, and provides a few-sample tuning method based on knowledge perception prompt and multi-prototype clustering, which comprises the steps of obtaining image data comprising a training set and a testing set, and extracting samples from the training set to construct a few-sample supporting set and a query set; the method comprises the steps of loading a pre-training visual language model, constructing a knowledge perception prompt template for few sample tasks, constructing a cross-mode knowledge attention module, generating knowledge enhanced feature representation, inputting samples of a support set into the pre-training visual language model, generating a plurality of prototype representations through clustering, inputting a query set sample into the pre-training visual language model to obtain a representation, calculating similarity between the representation and the prototype representations, and carrying out prediction and model tuning based on the similarity. According to the method, the external knowledge is integrated into the prompt template and the multi-prototype clustering learning is combined, so that the accuracy and generalization capability of the model in a few-sample classification task are remarkably improved.

Inventors

XIE YANZHAO
HUANG HAO
WANG YANGTAO
HUANG JIAWEI
FANG MEIE
TANG MAOBIN

Assignees

广州大学

Dates

Publication Date: 20260508
Application Date: 20260204

Claims (6)

1. A few sample tuning method based on knowledge perception prompt and multi-prototype clustering is characterized by comprising the following steps: S1, acquiring image data comprising a training set and a testing set, extracting samples from the training set to construct a few sample support set and a query set; s2, constructing a knowledge perception prompt template for a few sample task; S3, constructing a cross-modal knowledge attention module, modulating and enhancing visual features extracted from the support set sample by the pre-training model by utilizing text semantic embedding, and generating knowledge enhanced feature representation; s4, inputting the samples of the support set into the pre-training visual language model by using the knowledge perception prompt template, and generating a plurality of prototype representations through clustering; and S5, inputting the query set sample into the pre-training visual language model to obtain a representation, calculating the similarity between the representation and the prototype representations, and performing prediction and model tuning based on the similarity.
2. The method for optimizing a few samples based on knowledge sense cues and multi-prototype clusters according to claim 1, wherein the constructing a knowledge sense cue template in S2 comprises: And searching descriptive text or structured knowledge related to the task labels from the external knowledge base, and fusing the searched knowledge into a basic prompt template to form an enhanced prompt containing contextual knowledge and task instructions.
3. The knowledge-aware hint and multi-prototype clustering based few-sample tuning method of claim 1, wherein S3 comprises: And carrying out cluster analysis on the representation vector of the support set sample after model coding, and generating K cluster centers for each category, wherein K is more than or equal to 1, and each cluster center is used as a prototype representation.
4. The method for optimizing a small sample based on knowledge-aware cues and multi-prototype clusters according to claim 1, characterized in that S4 comprises: the similarity between the query sample representation and all prototype representations under a category is calculated and the maximum or weighted sum is taken as the final similarity score for the query sample and the category.
5. The knowledge perception hint and multi-prototype clustering-based few-sample tuning method according to claim 1, wherein the model tuning in S5 adopts a hint tuning paradigm based on cross entropy loss, and the loss function is calculated based on prediction distribution of query samples and real labels.
6. The method of claim 5, wherein the loss function additionally introduces a support set-based prototype-contrast loss for narrowing down similar prototypes and pushing away different similar prototypes.

Description

Knowledge perception prompt and multi-prototype clustering-based few-sample tuning method Technical Field The invention belongs to the field of machine learning, and particularly relates to a few-sample optimization method based on knowledge perception prompt and multi-prototype clustering. Background In recent years, a visual language model based on a transducer architecture has made remarkable progress in the field of computer vision, and exhibits excellent performance in tasks such as image classification and target detection. Such models typically rely on two key techniques, large-scale pre-training and downstream task tuning, but still face many challenges in a low sample tuning scenario. On one hand, the pre-training visual encoder has the problem of local information disappearance, and excessively focuses on global features of images to neglect local fine-grained features, so that the extracted features contain a large amount of background redundant information to influence the recognition effect of foreground key features, and on the other hand, in the small sample fine-tuning stage, the model is difficult to utilize the whole distribution information of a downstream task data set, and the traditional single prototype clustering method regards a classification boundary as linear and cannot adapt to a complex nonlinear classification boundary in a real scene. Meanwhile, efficient fine tuning techniques for parameters such as prompt learning and adapter fine tuning reduce the computational cost, but still have limitations. The learning parameters prompting the learning and the adapter fine tuning are driven by data, the global data information is strongly depended, the fine granularity information in the image is not focused enough, in the fine tuning process, the basic model is easy to be forgotten in a disastrous way, the general knowledge learned in the pre-training stage is lost, and the generalization capability of the model is influenced. In addition, the existing method has poor adaptability to data outside distribution when processing cross-domain data, and is difficult to meet diversified task requirements in practical application. Therefore, there is a need for a less sample tuning method that can compromise local feature capture, nonlinear boundary fitting, and catastrophic forgetting mitigation. Disclosure of Invention In view of the above-mentioned drawbacks of the prior art, the present invention provides a method for optimizing a few samples based on knowledge perception cues and multi-prototype clustering, and the steps of the technical scheme of the present invention include: S1, acquiring image data comprising a training set and a testing set, extracting samples from the training set to construct a few sample support set and a query set; s2, constructing a knowledge perception prompt template for the task with few samples; S3, constructing a cross-modal knowledge attention module, modulating and enhancing visual features extracted from the support set sample by the pre-training model by utilizing text semantic embedding, and generating knowledge enhanced feature representation; s4, inputting the samples of the support set into the pre-training visual language model by using the knowledge perception prompt template, and generating a plurality of prototype representations through clustering; S5, inputting a query set sample into the pre-training visual language model to obtain a representation, calculating the similarity between the representation and the prototype representations, and carrying out prediction and model tuning based on the similarity Preferably, the constructing a knowledge perception hint template in S2 includes: And searching descriptive text or structured knowledge related to the task labels from the external knowledge base, and fusing the searched knowledge into a basic prompt template to form an enhanced prompt containing contextual knowledge and task instructions. Preferably, the S3 includes: And carrying out cluster analysis on the representation vector of the support set sample after model coding, and generating K cluster centers for each category, wherein K is more than or equal to 1, and each cluster center is used as a prototype representation. Preferably, the S4 includes: the similarity between the query sample representation and all prototype representations under a category is calculated and the maximum or weighted sum is taken as the final similarity score for the query sample and the category. Preferably, the model tuning in S5 adopts a prompt tuning paradigm based on cross entropy loss, and the loss function is calculated based on the prediction distribution of the query sample and the real label. Preferably, the loss function additionally introduces a support set-based prototype-contrast loss for zooming in on the same-class prototype and zooming out on a different-class prototype. The beneficial effects are that: 1. And the fine granularity feature is