CN-122023840-A - Image processing method and device

CN122023840ACN 122023840 ACN122023840 ACN 122023840ACN-122023840-A

Abstract

The embodiment of the invention provides an image processing method and device, which comprises the steps of obtaining an original image and a template image, wherein the template image is an image related to a specified object, determining a target area matched with the template image from the original image, generating original image characteristics of the original image and target area characteristics of the target area, generating gating fusion characteristics of the original image according to the original image characteristics and the target area characteristics, and processing the original image according to the gating fusion characteristics so as to position the specified object in the original image. The embodiment of the invention can accurately identify and position the appointed object in the original image based on the template image, and simultaneously can process complex scenes based on the excellent characteristic extraction and learning capability of the image detection model to the original image, thereby improving the identification precision and generalization capability of the image detection model in the image processing task.

Inventors

CHEN XUHAO

Assignees

北京罗克维尔斯科技有限公司

Dates

Publication Date: 20260512
Application Date: 20241111

Claims (11)

1. An image processing method, the method comprising: Acquiring an original image and a template image, wherein the template image is an image associated with a specified object; Determining a target area matched with the template image from the original image; generating original image features of the original image and target region features of the target region; generating a gating fusion feature of the original image according to the original image feature and the target area feature; and processing the original image according to the gating fusion characteristic so as to locate the specified object in the original image.
2. The method of claim 1, wherein determining a target region from the original image that matches the template image comprises: Calculating the similarity between the subarea in the original image and the template image; and taking the sub-region with the similarity exceeding the preset similarity as a target region of the original image.
3. The method of claim 1, wherein generating a gated fusion feature of the original image from the original image feature and the target region feature comprises: acquiring the confidence coefficient of the target area, and converting the confidence coefficient into attention weight; obtaining weighted features according to the target region features and the attention weights; Dividing the weighting characteristics into corresponding class barrels according to the classes of the target areas; carrying out average value calculation on the weighted features in each category bucket to obtain average features corresponding to the categories; And generating a gating fusion feature of the original image according to the original image feature and the average feature of each category.
4. A method according to claim 3, involving an image detection model comprising a fully connected layer, generating a gated fusion feature of the original image from the original image features and the average features of each of the categories, comprising: Carrying out dot product and normalization processing on the average characteristic of each category and the original image characteristic to obtain the attention characteristic matrix of each category; And generating a gating fusion feature according to the attention feature matrix of each category.
5. The method of claim 4, wherein the fully-connected layer comprises a first fully-connected layer and a second fully-connected layer, generating gated fusion features from the attention feature matrix for each category, comprising: inputting the attention feature matrix of each category into the first full-connection layer to obtain fusion features; Inputting the attention characteristic matrix of each category into the second full connection layer to obtain a gating signal; And generating a gating fusion characteristic according to the fusion characteristic and the gating signal.
6. The method of claim 5, wherein processing the original image according to the gated fusion feature comprises: Retaining the original image characteristics of the original image and/or target area characteristics of the target area according to the gating fusion characteristics; Or alternatively And ignoring the original image characteristics of the original image and/or the target area characteristics of the target area according to the gating fusion characteristics.
7. The method of claim 4, wherein the template image comprises a predefined image and a template image selected from the original images; the image detection model comprises a convolution layer, the convolution layer comprises a first convolution layer and a second convolution layer, and the generation of the original image features of the original image and the target region features of the target region comprises the following steps: Inputting the original image characteristics of the original image into the first convolution layer to acquire the original image characteristics of the original image; And inputting the target region characteristics of the target region into the second convolution layer to acquire the target region characteristics of the target region.
8. An image processing apparatus, characterized in that the apparatus comprises: The image acquisition module is used for acquiring an original image and a template image, wherein the template image is an image associated with a specified object; the region matching module is used for determining a target region matched with the template image from the original image; The image feature generation module is used for generating original image features of the original image and target area features of the target area; The gating fusion feature generation module is used for generating gating fusion features of the original image according to the original image features and the target region features; And the original image processing module is used for processing the original image according to the gating fusion characteristic so as to position the specified object in the original image.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; a processor for carrying out the method steps of any one of claims 1-7 when executing a program stored on a memory.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
11. A vehicle that performs the image processing method according to any one of claims 1 to 7.

Description

Image processing method and device Technical Field Embodiments of the present invention relate to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a computer readable storage medium. Background In the current image recognition technology framework, most applications are built based on traditional neural network models, which are usually trained on large-scale image data to identify and locate objects in an image, for example, objects such as gyroscopes, cats or dogs in the image need to be identified and located. Conventional neural network models, such as convolutional neural networks (CNN, convolutional Neural Network) and Region-based convolutional neural networks (R-CNN, region-based Convolutional Neural Network), have been remarkably successful in various fields, but still have problems such as failure to provide more accurate object localization information, and thus have resulted in poor accuracy of the results of processing such as recognition and localization of images. Disclosure of Invention The embodiment of the invention aims to provide an image processing method for solving the problem that the accuracy of the results of the conventional neural network model on image recognition, positioning and other processes is poor. The specific technical scheme is as follows: In a first aspect of the present invention, there is provided an image processing method, the method comprising: Acquiring an original image and a template image, wherein the template image is an image associated with a specified object; Determining a target area matched with the template image from the original image; generating original image features of the original image and target region features of the target region; generating a gating fusion feature of the original image according to the original image feature and the target area feature; and processing the original image according to the gating fusion characteristic so as to locate the specified object in the original image. Optionally, determining a target area matched with the template image from the original image includes: Calculating the similarity between the subarea in the original image and the template image; and taking the sub-region with the similarity exceeding the preset similarity as a target region of the original image. Optionally, generating a gating fusion feature of the original image according to the original image feature and the target region feature, including: acquiring the confidence coefficient of the target area, and converting the confidence coefficient into attention weight; obtaining weighted features according to the target region features and the attention weights; Dividing the weighting characteristics into corresponding class barrels according to the classes of the target areas; carrying out average value calculation on the weighted features in each category bucket to obtain average features corresponding to the categories; And generating a gating fusion feature of the original image according to the original image feature and the average feature of each category. Optionally, the image detection model includes a full connection layer, generating a gating fusion feature of the original image according to the original image feature and the average feature of each category, including: Carrying out dot product and normalization processing on the average characteristic of each category and the original image characteristic to obtain the attention characteristic matrix of each category; And generating a gating fusion feature according to the attention feature matrix of each category. Optionally, the fully connected layer includes a first fully connected layer and a second fully connected layer, and generating a gating fusion feature according to the attention feature matrix of each category includes: inputting the attention feature matrix of each category into the first full-connection layer to obtain fusion features; Inputting the attention characteristic matrix of each category into the second full connection layer to obtain a gating signal; And generating a gating fusion characteristic according to the fusion characteristic and the gating signal. Optionally, processing the original image according to the gating fusion feature includes: Retaining the original image features of the original image and/or target region features of the target region according to the gating fusion features, Or alternatively And ignoring the original image characteristics of the original image and/or the target area characteristics of the target area according to the gating fusion characteristics. Optionally, the template image includes a predefined image, and a template image selected from the original images; the image detection model comprises a convolution layer, the convolution layer comprises a first convolution layer and a second convolution layer, and the generation o