CN-121999473-A - Image processing method, image processing device, computer device, storage medium, and program product

CN121999473ACN 121999473 ACN121999473 ACN 121999473ACN-121999473-A

Abstract

The embodiment of the application provides an image processing method, an image processing device, computer equipment, a storage medium and a program product. The image processing method comprises the steps of obtaining an image, wherein the image contains an object, performing object detection processing on the image through an object detection module to obtain an object detection result, wherein the object detection result comprises region information of an object region detected from the image, performing feature extraction processing on the image through a feature extraction module to obtain a feature extraction result, wherein the feature extraction result comprises an object feature vector extracted from the image, integrating the object detection module and the feature extraction module in a unified model in parallel, performing the object detection processing of the object detection module and the feature extraction processing of the feature extraction module in parallel, and performing fusion processing on the object detection result and the feature extraction result to obtain an object recognition result of the image. By adopting the embodiment of the application, the object identification efficiency of the image can be improved.

Inventors

JIANG JIN
Hu Bojie

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260508
Application Date: 20241107

Claims (15)

1. An image processing method, comprising: acquiring an image, wherein the image contains an object; Performing object detection processing on the image through an object detection module to obtain an object detection result, wherein the object detection result comprises area information of an object area detected from the image; performing feature extraction processing on the image through a feature extraction module to obtain a feature extraction result, wherein the feature extraction result comprises an object feature vector extracted from the image; The object detection module and the feature extraction module are integrated in a unified model in parallel to execute the object detection processing and the feature extraction processing in parallel; and carrying out fusion processing on the object detection result and the feature extraction result to obtain an object identification result of the image.
2. The method of claim 1, wherein the performing, by the object detection module, the object detection on the image to obtain an object detection result comprises: Performing object detection processing on the image through an object detection module to obtain area information of one or more first candidate object areas; And determining the area information of a target candidate object area in the one or more first candidate object areas as the area information of the object area, wherein the target candidate object area is the first candidate object area meeting the object area condition in the one or more first candidate object areas.
3. The method of claim 2, wherein the region information for any one of the first candidate object regions includes a probability that the object is contained in the first candidate object region, the method further comprising: If the probability of the object contained in the first candidate object area is greater than or equal to a first probability threshold, determining that the first candidate object area meets an object area condition; If the probability of the object contained in the first candidate object area is greater than or equal to a second probability threshold, acquiring the proportion of the first candidate object area in the image; and if the proportion is greater than or equal to a proportion threshold value, determining that the first candidate object region meets an object region condition.
4. The method of claim 2, wherein the performing, by the object detection module, object detection processing on the image to obtain the region information of the one or more first candidate object regions includes: performing dimension reduction processing on the image to obtain a dimension reduction image; extracting one or more first regions of interest from the dimension-reduced image; Determining a mapping region of each first region of interest in the image as a first candidate object region corresponding to each first region of interest, and acquiring position information of each first candidate object region; predicting the probability of the object contained in the first candidate object region corresponding to each first region of interest according to the region feature vector of each first region of interest; Wherein the region information of any one of the first candidate object regions includes position information of the first candidate object region and a probability of the first candidate object region containing the object.
5. The method of claim 1, wherein the performing, by the feature extraction module, feature extraction processing on the image to obtain a feature extraction result includes: performing dimension reduction processing on the image to obtain a dimension reduction image; extracting one or more second regions of interest from the dimension-reduced image; Performing feature learning on each second region of interest to obtain a region feature vector in each second region of interest; and determining a mapping region of each second region of interest in the image as a second candidate object region corresponding to each second region of interest, and determining a region feature vector in each second region of interest as an object feature vector in the second candidate object region corresponding to each second region of interest.
6. The method of claim 1, wherein the region information of the object region includes position information of the object region and a probability of the object region including the object, the feature extraction result includes one or more object feature vectors, the feature extraction result further includes position information of a second candidate object region to which each of the object feature vectors belongs, and the fusing the object detection result and the feature extraction result to obtain an object recognition result of the image includes: Performing region position matching between each second candidate object region and the object region according to the position information of each second candidate object region and the position information of the object region; Determining object feature vectors in reference candidate object regions in one or more second candidate object regions as object feature vectors in the object regions, wherein the reference candidate object regions are second candidate object regions matched with the region positions of the object regions in the one or more second candidate object regions; And combining the position information of the object region, the probability of the object contained in the object region and the object feature vector in the object region to obtain an object recognition result of the image.
7. The method of any of claims 1-6, wherein the object detection module is integrated in parallel with the feature extraction module in a unified model, comprising: the unified model also comprises a dimension reduction module, wherein the dimension reduction module is shared by the object detection module and the feature extraction module, and the object detection module and the feature extraction module are connected to the dimension reduction module in parallel, or The object detection module and the feature extraction module are provided with respective dimension reduction layers, and are connected in parallel in the unified model; the object detection module and the feature extraction module have the same module structure but different module weights.
8. The method of claim 1, wherein the method further comprises: Acquiring training data, wherein the training data comprises a sample image; based on the training data, the object detection module and the feature extraction module are trained in stages, and a trained unified model is obtained.
9. The method of claim 8, wherein the sample images in the training data comprise a detected sample image in a detected image set and an extracted sample image in an extracted image set, wherein the step of training the object detection module and the feature extraction module in stages based on the training data to obtain a trained unified model comprises any one of the following steps: The object detection module is trained based on the detection image set to obtain a trained object detection module, the trained object detection module is frozen, and the feature extraction module is trained based on the extraction image set to obtain a trained feature extraction module; the method comprises the steps of extracting an image set, training a feature extraction module based on the extracted image set to obtain a trained feature extraction module, freezing the trained feature extraction module, training an object detection module based on the detection image set to obtain a trained object detection module, and determining the trained unified model according to the trained object detection module and the trained feature extraction module.
10. The method of claim 9, wherein the unified model further comprises a dimension reduction module shared by the object detection module and the feature extraction module, wherein training the object detection module based on the detection image set to obtain a trained object detection module comprises: training the object detection module and the dimension reduction module based on the detection image set to obtain a trained object detection module and a trained dimension reduction module; the freezing the trained object detection module comprises freezing the trained object detection module and the trained dimension reduction module.
11. The method according to claim 9 or 10, wherein the construction process of extracting the image set comprises: acquiring the detection sample image and the annotation information of the detection sample image, wherein the annotation information of the detection sample image comprises the annotation position information of an image area containing an object in the detection sample image; Covering the image area by adopting an object layer with a labeling object label according to the scale of the image area to obtain an extraction sample image in the extraction image set; and determining the labeling position information and the labeling object label as the labeling information of the extracted sample image.
12. An image processing apparatus, comprising: an acquisition unit configured to acquire an image including an object; The processing unit is used for carrying out object detection processing on the image through the object detection module to obtain an object detection result, wherein the object detection result comprises area information of an object area detected from the image; The processing unit is further used for carrying out feature extraction processing on the image through the feature extraction module to obtain a feature extraction result, wherein the feature extraction result comprises an object feature vector extracted from the image; The object detection module and the feature extraction module are integrated in a unified model in parallel to execute the object detection processing and the feature extraction processing in parallel; and the processing unit is also used for carrying out fusion processing on the object detection result and the feature extraction result to obtain an object identification result of the image.
13. A computer device, the computer device comprising: a processor adapted to implement a computer program; A computer readable storage medium storing a computer program adapted to be loaded by the processor and to perform the image processing method according to any one of claims 1-11.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program adapted to be loaded by a processor and to perform the image processing method according to any one of claims 1-11.
15. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the image processing method according to any of claims 1-11.

Description

Image processing method, image processing device, computer device, storage medium, and program product Technical Field The present application relates to the field of computer technology, and in particular to the field of artificial intelligence technology, and more particularly to an image processing method, an image processing apparatus, a computer device, a computer readable storage medium, and a computer program product. Background Object recognition techniques applied to image processing may detect object regions containing objects in an image and may extract object feature vectors of the objects in the image. The object recognition scheme adopted by the object recognition technology at present involves frequent I/O (input/output) operations in the object recognition process, and the frequent I/O (input/output) operations involve frequent memory operations (the memory operations may include, for example, memory read operations and memory storage operations), which seriously affect the object recognition performance, resulting in low object recognition efficiency on images. Therefore, how to improve the object recognition efficiency of images has become a current research hotspot. Disclosure of Invention The embodiment of the application provides an image processing method, an image processing device, computer equipment, a storage medium and a program product, which can improve the object recognition efficiency of images. In one aspect, an embodiment of the present application provides an image processing method, including: acquiring an image, wherein the image contains an object; Performing object detection processing on the image through an object detection model to obtain an object detection result, wherein the object detection result comprises region information of an object region detected from the image; carrying out feature extraction processing on the image through a feature extraction module to obtain a feature extraction result, wherein the feature extraction result comprises an object feature vector extracted from the image; The object detection module and the feature extraction module are integrated in parallel in the unified model to execute object detection processing and feature extraction processing in parallel; and carrying out fusion processing on the object detection result and the feature extraction result to obtain an object identification result of the image. Accordingly, an embodiment of the present application provides an image processing apparatus including: An acquisition unit configured to acquire an image including an object; the processing unit is used for carrying out object detection processing on the image through the object detection module to obtain an object detection result, wherein the object detection result comprises the region information of the object region detected from the image; The processing unit is also used for carrying out feature extraction processing on the image through the feature extraction module to obtain a feature extraction result, wherein the feature extraction result comprises an object feature vector extracted from the image; The object detection module and the feature extraction module are integrated in parallel in the unified model to execute object detection processing and feature extraction processing in parallel; And the processing unit is also used for carrying out fusion processing on the object detection result and the feature extraction result to obtain an object identification result of the image. In one implementation manner, the processing unit is configured to perform object detection processing on the image through the object detection module, and when an object detection result is obtained, the processing unit is specifically configured to perform the following steps: performing object detection processing on the image through an object detection module to obtain area information of one or more first candidate object areas; And determining the area information of a target candidate area in the one or more first candidate areas as the area information of the object area, wherein the target candidate area is the first candidate area meeting the object area condition in the one or more first candidate areas. In one implementation, the region information of any one of the first candidate regions includes a probability of containing an object in the first candidate region, and the processing unit is further configured to perform the steps of: if the probability of the object contained in the first candidate object area is greater than or equal to a first probability threshold, determining that the first candidate object area meets the object area condition; if the probability of the object contained in the first candidate object area is greater than or equal to a second probability threshold, acquiring the proportion of the first candidate object area in the image; and if the proportion is greater than or equal to the proportion thr