CN-122025039-A - Image auxiliary labeling method and device based on single-sample context prompt learning

CN122025039ACN 122025039 ACN122025039 ACN 122025039ACN-122025039-A

Abstract

The invention provides an image auxiliary labeling method and device based on single sample context prompt learning, comprising the steps of obtaining input image data, initializing the data and a label thereof for preprocessing to obtain algorithm input data; and inputting the target coding features into a pre-constructed decoder to obtain a prediction tag of the input image data. Under the condition of given prompt image data and labels thereof, the invention outputs the input image data through the encoder and the decoder to realize the output of the corresponding prediction labels. According to the method, only one image is required to be marked, so that the cost of manual marking is greatly reduced, the problem that a large amount of data are difficult to acquire for training in the prior art is solved, a large amount of manual marking is not relied on, and high-accuracy marking can be provided.

Inventors

LI BAIRUI
LIAN HEQING
ZHANG QIAN

Assignees

北京小蝇科技有限责任公司

Dates

Publication Date: 20260512
Application Date: 20260120

Claims (10)

1. An image auxiliary labeling method based on single-sample context prompt learning is characterized by comprising the following steps: acquiring input image data, initializing a label of the input image data, and obtaining an initialized label; preprocessing the input image data, the initialization label thereof, the prompting image data and the label thereof to obtain algorithm input data; inputting the algorithm input data into a pre-constructed encoder to obtain target coding characteristics; And inputting the target coding characteristic into a pre-constructed decoder to obtain a prediction tag of the input image data.
2. The image assisted labeling method based on single sample context prompt learning of claim 1, wherein preprocessing the input image data and its initialization tag, the prompt image data and its tag to obtain algorithm input data comprises: Splicing the label of the prompt image data and the initialization label to be used as a target image label; The target image label is subjected to blocking processing, and the position code of each image block is recorded to obtain a second image block; adding a category vector to the first image block and the second image block to obtain image input data and label input data, wherein the category vector is used for distinguishing the image data and the label data; And forming the algorithm input data according to the image input data and the label input data.
3. The single sample context prompt learning based image assisted annotation method of claim 1, wherein the encoder comprises a target number of encoding blocks, the target number being the sum of a first number and a second number, the encoding blocks comprising an LN layer, self-attention layer, a residual connection layer, and an MLP layer.
4. The method for image assisted annotation based on single sample context prompt learning of claim 3, wherein inputting the algorithmic input data into a pre-constructed encoder yields target encoding features, comprising: S11, selecting a first coding block in the first number of coding blocks of the coder as a current coding block, taking the image input data in the algorithm input data as a first input and the tag input data in the algorithm input data as a second input; S12, inputting the first input and the second input to a first LN layer of the current coding block at the same time to obtain first normalization features of the first input and the second input; s13, inputting first normalization features of the first input and the second input to a self-attention layer of the current coding block to obtain global features of the first input and the second input; S14, adding the global feature of the first input and the first input through a first residual connection, marking the added result as the residual feature of the first input, and inputting the residual feature of the first input to a second LN layer of the current coding block to obtain a second normalized feature of the first input; adding the global feature of the second input and the second input through a first residual connection, recording the added result as the residual feature of the second input, and inputting the residual feature of the second input to a second LN layer of the current coding block to obtain a second normalized feature of the second input; s15, inputting second normalization features of the first input and the second input to an MLP layer to obtain nonlinear transformation features of the first input and the second input; s16, adding the nonlinear transformation characteristic of the first input and the residual characteristic of the first input through a second residual connection to obtain an image output characteristic; adding the nonlinear transformation characteristic of the second input and the residual characteristic of the second input through second residual connection to obtain a label output characteristic; S17, taking the image output characteristic as a new first input, taking the tag output characteristic as a new second input, and taking the next coding block of the current coding block as the current coding block; Repeating the steps S12-S17 until the first number of coding blocks are traversed, and outputting the image output characteristics and the label output characteristics of the last iteration; And S18, carrying out average processing on the image output characteristic and the label output characteristic to obtain a first coding characteristic.
5. The method for image assisted annotation based on single sample context prompt learning of claim 4, wherein inputting the algorithmic input data into a pre-constructed encoder yields target encoding features, comprising: S21, selecting a first coding block in the second number of coding blocks of the coder as a current coding block, and taking the first coding characteristic as a current input; s22, inputting the current input to a first LN layer of the current coding block to obtain a first normalization feature of the current input; S23, inputting the first normalization feature of the current input to a self-attention layer of the current coding block to obtain the global feature of the current input; s24, adding the global feature of the current input and the current input through a first residual connection, marking the added result as the residual feature of the current input, and inputting the residual feature of the current input to a second LN layer of the current coding block to obtain a second normalization feature of the current input; S25, inputting the second normalization feature of the current input to an MLP layer to obtain a nonlinear transformation feature of the current input; S26, adding the nonlinear transformation characteristic of the current input and the residual characteristic of the current input through a second residual connection to obtain a current output characteristic; S27, taking the current output characteristic as new current input, and taking the next coding block of the current coding block as the current coding block; Repeating the steps S22-S27 until the second number of coding blocks are traversed, and outputting the current output characteristic of the last iteration as the output characteristic of the last layer of the encoder; and S28, extracting the output of a preset layer from the encoder, and splicing the extracted result with the output characteristic of the last layer of the encoder to obtain the target coding characteristic.
6. The method for image assisted annotation based on single sample context prompt learning of claim 4, wherein inputting the target encoding features into a pre-constructed decoder to obtain a predictive label for the input image data comprises: inputting the target coding feature to a first convolution layer of the decoder, and adjusting a channel of the target coding feature to obtain a first decoding feature; performing Reshape operations on the first decoding feature to obtain a second decoding feature; Inputting the second decoding characteristic to a second convolution layer of the decoder, and adjusting the channel number of the second decoding characteristic to obtain a third decoding characteristic; extracting a part corresponding to the tag input data in the third decoding feature, and converting the extracted result into the size of the initialization tag to obtain a target decoding feature; and converting the target decoding characteristics into an image format to obtain a prediction tag of the input image data.
7. An image auxiliary labeling device based on single-sample context prompt learning is characterized by comprising: The input unit is used for acquiring input image data and initializing a label of the input image data to obtain an initialized label; The processing unit is used for preprocessing the input image data, the initialization label thereof, the prompt image data and the label thereof to obtain algorithm input data; the coding unit is used for inputting the algorithm input data into a pre-constructed coder to obtain target coding characteristics; and the decoding unit is used for inputting the target coding characteristic into a pre-constructed decoder to obtain a prediction tag of the input image data.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the image assisted annotation method based on single sample context prompt learning of any of claims 1 to 6 when the program is executed by the processor.
9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the image-assisted annotation method based on single-sample context prompt learning according to any of claims 1 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, implements the image-assisted annotation method based on single-sample context prompt learning as claimed in any one of claims 1 to 6.

Description

Image auxiliary labeling method and device based on single-sample context prompt learning Technical Field The invention relates to the technical field of data processing, in particular to an image auxiliary labeling method and device based on single-sample context prompt learning. Background At present, artificial intelligence is rapidly developed, and with the deep application of artificial intelligence technology in the medical field, a medical image auxiliary system based on deep learning has become an important force for promoting accurate medical development. However, training of high-performance medical image-aided systems is highly dependent on high-precision labeled medical image data, and conventional labeling methods face the following technical bottlenecks: Manual labeling in the medical field relies heavily on expertise, and the cost of labeling is very high. In addition, the consistency of the labeling is difficult to ensure, and the labeling difference of different labeling personnel on the same image is large, so that the quality of a data set is influenced, and the cost of later review is increased. Meanwhile, the characteristics of the medical data greatly limit the available scale, and a large amount of data is difficult to acquire for training. Resulting in poor performance of the model trained based on the annotation data. In summary, the prior art has the problem that model accuracy depends on training data, but labeling data is insufficient. Disclosure of Invention The invention provides an image auxiliary labeling method and device based on single-sample context prompt learning, which are used for solving the defect that model accuracy lacks labeling data in the prior art and realizing automatic data labeling of training data with high accuracy. The invention provides an image auxiliary labeling method based on single-sample context prompt learning, which comprises the following steps: acquiring input image data, initializing a label of the input image data, and obtaining an initialized label; preprocessing the input image data, the initialization label thereof, the prompting image data and the label thereof to obtain algorithm input data; inputting the algorithm input data into a pre-constructed encoder to obtain target coding characteristics; And inputting the target coding characteristic into a pre-constructed decoder to obtain a prediction tag of the input image data. According to the image auxiliary labeling method based on single-sample context prompt learning, the input image data and the initialization label thereof, the prompt image data and the label thereof are preprocessed to obtain algorithm input data, and the method comprises the following steps: Splicing the label of the prompt image data and the initialization label to be used as a target image label; The target image label is subjected to blocking processing, and the position code of each image block is recorded to obtain a second image block; adding a category vector to the first image block and the second image block to obtain image input data and label input data, wherein the category vector is used for distinguishing the image data and the label data; And forming the algorithm input data according to the image input data and the label input data. According to the image auxiliary labeling method based on single-sample context prompt learning, the encoder comprises a target number of coding blocks, the target number is the sum of a first number and a second number, and the coding blocks comprise an LN layer, a self-attention layer, a residual connection layer and an MLP layer. According to the image auxiliary labeling method based on single-sample context prompt learning, the algorithm input data is input into a pre-constructed encoder to obtain target coding characteristics, and the method comprises the following steps: S11, selecting a first coding block in the first number of coding blocks of the coder as a current coding block, taking the image input data in the algorithm input data as a first input and the tag input data in the algorithm input data as a second input; S12, inputting the first input and the second input to a first LN layer of the current coding block at the same time to obtain first normalization features of the first input and the second input; s13, inputting first normalization features of the first input and the second input to a self-attention layer of the current coding block to obtain global features of the first input and the second input; S14, adding the global feature of the first input and the first input through a first residual connection, marking the added result as the residual feature of the first input, and inputting the residual feature of the first input to a second LN layer of the current coding block to obtain a second normalized feature of the first input; adding the global feature of the second input and the second input through a first residual connection, recording the ad