US-12620197-B2 - Image processing method, apparatus, computing device, and medium

US12620197B2US 12620197 B2US12620197 B2US 12620197B2US-12620197-B2

Abstract

The present disclosure provides an image processing method, an apparatus, a computing device, and a medium, which relates to deep learning technology. In the present disclosure, after acquiring the to-be-processed target image, the first similarities between the target image and the first images of categories are determined based on the first feature vector corresponding to the target image and the second feature vectors respectively corresponding to the plurality of first images. Moreover, based on the first color distribution information of the target region in the target image and the second color distribution information of the target regions in the plurality of first images, the second similarities between the target image and the plurality first images are determined, so that the image category to which the target image belongs can be determined jointly based on the first similarities and the second similarities.

Inventors

Zeyu SHANGGUAN
Tong Liu
Guangwei HUANG
Fanhao KONG

Assignees

BOE TECHNOLOGY GROUP CO., LTD.

Dates

Publication Date: 20260505
Application Date: 20211117

Claims (19)

1 . An image processing method comprising: acquiring a to-be-processed target image; determining first similarities between the target image and first images of categories based on a first feature vector corresponding to the target image and second feature vectors respectively corresponding to a plurality of first images, wherein the plurality of first images have been labeled with image categories, and the plurality of first images correspond to the plurality of image categories; determining second similarities between the target image and the plurality of first images based on first color distribution information of a target region in the target image and second color distribution information of target regions in the plurality of first images; determining an image category to which the target image belongs from the image categories of the plurality of first images based on the first similarities and the second similarities.
2 . The method according to claim 1 , wherein the color distribution information is a color distribution spectrum; determining second similarities between the target image and the plurality of first images based on first color distribution information of a target region in the target image and second color distribution information of target regions in the plurality of first images comprises: respectively determining a color distribution similarity between the first color distribution spectrum and each of the second color distribution spectra; for at least one first image belonging to any one image category, determining a color distribution similarity with the largest value in the color distribution similarities corresponding to the at least one first image as a second similarity between the target image and the at least one first image.
3 . The method according to claim 2 , wherein prior to determining second similarities between the target image and the plurality of first images based on first color distribution information of a target region in the target image and second color distribution information of target regions in the plurality of first images, the method further comprises: determining target regions respectively from the target image and the plurality of first images based on an attention matrix of an image classification model; acquire the first color distribution information of the target region in the target image, and the second color distribution information of the target region in each of the first images.
4 . The method according to claim 1 , wherein the first similarity is a cosine similarity, which is used to indicate a cosine distance between the first feature vector corresponding to the target image and the second feature vector corresponding to the first images of each category; determining first similarities between the target image and first images of categories based on a first feature vector corresponding to the target image and second feature vectors respectively corresponding to a plurality of first images comprising: determining a first vector sequence for representing the target image and a plurality of second vector sequences for representing the plurality of first images through an embedding layer of an image classification model based on the target image and the plurality of first images; obtaining the first feature vector and the plurality of second feature vectors through an encoder of the image classification model based on the first vector sequence and the plurality of second vector sequences, and determining cosine similarities between the target image and the plurality of first images based on the first feature vector and the plurality of second feature vectors.
5 . The method according to claim 4 , wherein obtaining the first feature vector and the plurality of second feature vectors through an encoder of the image classification model based on the first vector sequence and the plurality of second vector sequences, and determining cosine similarities between the target image and the plurality of first images based on the first feature vector and the plurality of second feature vectors comprising: inputting the first vector sequence and the plurality of second vector sequences to the encoder, and through the encoder, determining the first feature vector corresponding to the first vector sequence and the plurality of second feature vectors corresponding to the plurality of second vector sequences; for at least one first image belonging to any one image category, determining a mean vector of the second feature vectors corresponding to the at least one first image; determining a cosine distance between the first feature vector and the mean vector, and determining a cosine similarity between the target image and the at least one first image based on the cosine distance.
6 . The method according to claim 5 , wherein determining a cosine similarity between the target image and the plurality of first images based on the cosine distance comprising any one of: when the cosine distance is greater than a preset distance threshold, determining the cosine similarity as a first value; when the cosine distance is less than or equal to the preset distance threshold, determining the cosine similarity as a second value.
7 . The method according to claim 1 , wherein determining an image category to which the target image belongs from the image categories of the plurality of first images based on the first similarities and the second similarities comprises: based on a first weight corresponding to the first similarity and a second weight corresponding to the second similarity, calculating a weighted sum of the first similarity and the second similarity to obtain an image similarity between the target image and the first images of each category; determining an image category corresponding to a largest similarity of the target image among the image similarities as the image category to which the target image belongs.
8 . The method according to claim 1 , wherein the image classification model is pre-trained; a process of training the image classification model comprises: acquiring a plurality of first sample images labeled with sample image categories; inputting the plurality of first sample images into an initial visual converter model, and determining predicted image categories of the plurality of first sample images through the initial visual converter model; training the initial visual converter model based on a first loss function indicating a difference between predicted image categories of the plurality of first sample images and sample image categories of the plurality of first sample images until a preset training completion condition is met, to obtain a trained visual converter model; acquiring the image classification model based on the trained visual converter model.
9 . The method according to claim 8 , wherein the trained visual converter model comprises an embedding layer, a converter encoder, and a multi head perceptron; acquiring the image classification model based on the trained visual converter model comprises: acquiring an embedding layer and a converter encoder from the trained visual converter model to form an initial image classification model; acquiring a plurality of second sample images labeled with similarity truth values; for any two second sample images among the plurality of second sample images, inputting the two second sample images into the initial image classification model, and outputting a first similarity prediction value of the two second sample images from the initial image classification model; training the initial image classification model based on a second loss function indicating a difference between the first similarity prediction value and the similarity truth value.
10 . The method according to claim 9 , further comprising: determining a second similarity prediction result between color distribution information of the target regions in two second sample images based on an attention matrix of the initial image classification model; determining a second similarity prediction value based on the second similarity prediction result and a preset similarity threshold; training the initial image classification model based on a second loss function indicating a difference between the first similarity prediction value and the similarity truth value and a third loss function indicating a difference between the second similarity prediction value and the similarity truth value.
11 . The method according to claim 10 , wherein determining a second similarity prediction value based on the second similarity prediction result and a preset similarity threshold comprises any one of: when the second similarity prediction result is greater than the preset similarity threshold, determining the second similarity prediction value as a first value; when the second similarity prediction result is less than or equal to the preset similarity threshold, determining the second similarity prediction value as a second value.
12 . The method according to claim 10 , wherein training the initial image classification model based on a second loss function indicating a difference between the first similarity prediction value and the similarity truth value and a third loss function indicating a difference between the second similarity prediction value and the similarity truth value comprises: calculating a weighted sum of the second loss function and the third loss function to obtain a target loss function based on a first initial weight corresponding to the second loss function and a second initial weight corresponding to the third loss function; training the initial image classification model based on the target loss function until a training completion condition is met to obtain the image classification model.
13 . A non-transitory computer-readable storage medium, wherein a program is stored on the computer-readable storage medium, and when the program is executed by a processor, the operations of the image processing method according to claim 1 is implemented.
14 . A computing device, comprising a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to implement operations: acquiring a to-be-processed target image; determining first similarities between the target image and first images of categories based on a first feature vector corresponding to the target image and second feature vectors respectively corresponding to a plurality of first images, wherein the plurality of first images have been labeled with image categories, and the plurality of first images correspond to the plurality of image categories; determining second similarities between the target image and the plurality of first images based on first color distribution information of a target region in the target image and second color distribution information of target regions in the plurality of first images; determining an image category to which the target image belongs from the image categories of the plurality of first images based on the first similarities and the second similarities.
15 . The computing device according to claim 14 , wherein the color distribution information is a color distribution spectrum; determining second similarities between the target image and the plurality of first images based on first color distribution information of a target region in the target image and second color distribution information of target regions in the plurality of first images comprises: respectively determining a color distribution similarity between the first color distribution spectrum and each of the second color distribution spectra; for at least one first image belonging to any one image category, determining a color distribution similarity with the largest value in the color distribution similarities corresponding to the at least one first image as a second similarity between the target image and the at least one first image.
16 . The computing device according to claim 15 , wherein prior to determining second similarities between the target image and the plurality of first images based on first color distribution information of a target region in the target image and second color distribution information of target regions in the plurality of first images, the operations further comprise: determining target regions respectively from the target image and the plurality of first images based on an attention matrix of an image classification model; acquire the first color distribution information of the target region in the target image, and the second color distribution information of the target region in each of the first images.
17 . The computing device according to claim 14 , wherein the first similarity is a cosine similarity, which is used to indicate a cosine distance between the first feature vector corresponding to the target image and the second feature vector corresponding to the first images of each category; determining first similarities between the target image and first images of categories based on a first feature vector corresponding to the target image and second feature vectors respectively corresponding to a plurality of first images comprising: determining a first vector sequence for representing the target image and a plurality of second vector sequences for representing the plurality of first images through an embedding layer of an image classification model based on the target image and the plurality of first images; obtaining the first feature vector and the plurality of second feature vectors through an encoder of the image classification model based on the first vector sequence and the plurality of second vector sequences, and determining cosine similarities between the target image and the plurality of first images based on the first feature vector and the plurality of second feature vectors.
18 . The computing device according to claim 17 , wherein obtaining the first feature vector and the plurality of second feature vectors through an encoder of the image classification model based on the first vector sequence and the plurality of second vector sequences, and determining cosine similarities between the target image and the plurality of first images based on the first feature vector and the plurality of second feature vectors comprising: inputting the first vector sequence and the plurality of second vector sequences to the encoder, and through the encoder, determining the first feature vector corresponding to the first vector sequence and the plurality of second feature vectors corresponding to the plurality of second vector sequences; for at least one first image belonging to any one image category, determining a mean vector of the second feature vectors corresponding to the at least one first image; determining a cosine distance between the first feature vector and the mean vector, and determining a cosine similarity between the target image and the at least one first image based on the cosine distance.
19 . The computing device according to claim 18 , wherein determining a cosine similarity between the target image and the plurality of first images based on the cosine distance comprising any one of: when the cosine distance is greater than a preset distance threshold, determining the cosine similarity as a first value; when the cosine distance is less than or equal to the preset distance threshold, determining the cosine similarity as a second value.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS The present application is a U.S. National Phase of International Application No. PCT/CN2021/131260, filed Nov. 17, 2021, the entire contents of which are hereby incorporated by reference for all purposes. TECHNICAL FIELD The present disclosure relates to the field of deep learning technology, and more particularly, to an image processing method, an image processing apparatus, an electronic device, and a medium. BACKGROUND With continuous development of deep learning, image classification, segmentation, and recognition through deep learning have become an important technical means in image processing. However, due to a high cost of image sample collection and even inability to collect certain image samples, a training effect of an image classification model is undesirable, resulting in low accuracy in image classification. In view of this, a small sample learning scheme has been proposed. The so-called small sample learning refers to using a much smaller sample size than a required sample size for deep learning of big data, while achieving a processing effect that is close to or even beyond that of deep learning of big data. Through small sample learning, an image classification model with high accuracy can be obtained in the case of limited image samples. In related art, a main approach is to train a Convolutional Neural Network (CNN) by collecting public image samples which can be sampled with lower difficulty, to obtain a pre-trained model. Then, the pre-trained model is trained on image samples corresponding to actual image classification requirements, to obtain an image classification model that can meet actual image classification requirements. In the above implementation, due to a large number of categories and images in the sample images, as well as a significant variation in sizes of sample images of a same category, and a lot of potential noises in the sample images, a training effect of the model is undesirable, which in turn leads to an undesirable classification accuracy in the trained image classification model. SUMMARY The present disclosure provides an image processing method, an apparatus, a computing device, and a medium, to solve the deficiency in the related art. According to a first aspect of the embodiments of the present disclosure, there is provided an image processing method including: acquiring a to-be-processed target image;determining first similarities between the target image and first images of categories based on a first feature vector corresponding to the target image and second feature vectors respectively corresponding to a plurality of first images, wherein the plurality of first images have been labeled with image categories, and the plurality of first images correspond to the plurality of image categories;determining second similarities between the target image and the plurality of first images based on first color distribution information of a target region in the target image and second color distribution information of target regions in the plurality of first images;determining an image category to which the target image belongs from the image categories of the plurality of first images based on the first similarities and the second similarities. In an embodiment of the present disclosure, the color distribution information is a color distribution spectrum; determining second similarities between the target image and the plurality of first images based on first color distribution information of a target region in the target image and second color distribution information of target regions in the plurality of first images includes: respectively determining a color distribution similarity between the first color distribution spectrum and each of the second color distribution spectra;for at least one first image belonging to any one image category, determining a color distribution similarity with the largest value in the color distribution similarities corresponding to the at least one first image as a second similarity between the target image and the at least one first image. In an embodiment of the present disclosure, prior to determining second similarities between the target image and the plurality of first images based on first color distribution information of a target region in the target image and second color distribution information of target regions in the plurality of first images, the method further includes: determining target regions respectively from the target image and the plurality of first images based on an attention matrix of an image classification model;acquire the first color distribution information of the target region in the target image, and the second color distribution information of the target region in each of the first images. In an embodiment of the present disclosure, the first similarity is a cosine similarity, which is used to indicate a cosine distance between the feature vector of the targe