CN-116229271-B - Image difference detection method, apparatus, device, storage medium, and program product

CN116229271BCN 116229271 BCN116229271 BCN 116229271BCN-116229271-B

Abstract

The present application relates to an image difference detection method, apparatus, device, storage medium, and program product. Relates to the technical field of artificial intelligence. The method comprises the steps of extracting a first global local feature corresponding to a first image to be detected and a second global local feature corresponding to a second image to be detected through a feature coding network of a difference perception model, wherein the first image to be detected and the second image to be detected are images acquired from the same area at different moments, performing difference perception processing on the first global local feature and the second global local feature to determine difference perception features between the first image to be detected and the second image to be detected, and performing decoding processing on the difference perception features through a feature decoding network of the difference perception model to obtain a target difference area between the first image to be detected and the second image to be detected. By adopting the method, the accuracy of image difference detection can be improved.

Inventors

FENG RU
YANG XIAOCHENG
WANG NA
ZHANG HONGTAO

Assignees

中国工商银行股份有限公司

Dates

Publication Date: 20260505
Application Date: 20230306

Claims (11)

1. An image difference detection method, the method comprising: Extracting a first global local feature corresponding to a first image to be detected and a second global local feature corresponding to a second image to be detected through a feature coding network of a difference perception model, wherein the first image to be detected and the second image to be detected are images acquired from the same area at different moments; performing difference perception processing on the first global local feature and the second global local feature, and determining difference perception features between the first image to be detected and the second image to be detected; Decoding the difference perception features through a feature decoding network of the difference perception model to obtain a target difference region between the first image to be detected and the second image to be detected; the feature coding network comprises at least two sub coding networks connected end to end and a difference sensing network, wherein the difference sensing network comprises at least two sub difference sensing networks connected end to end, and each sub difference sensing network corresponds to each sub coding network in the feature coding network one by one; Performing difference sensing processing on the first global local feature and the second global local feature, and determining difference sensing features between the first to-be-detected image and the second to-be-detected image, wherein the difference sensing processing comprises the following steps: Determining sub-difference characteristics according to input data corresponding to each sub-difference sensing network through each sub-difference sensing network; fusing the sub-difference feature determined by the last sub-difference sensing network with the first global local feature and the second global local feature extracted by the sub-coding network corresponding to the last sub-difference sensing network to obtain a difference sensing feature between the first image to be detected and the second image to be detected; The input data of each other sub-difference sensing network is the input data of the sub-coding network corresponding to the sub-difference sensing network, the first global local feature and the second global local feature extracted by the sub-coding network corresponding to the sub-difference sensing network, and the sub-difference feature determined by the last sub-difference sensing network of the sub-difference sensing network.
2. The method according to claim 1, wherein extracting, by the feature encoding network of the difference perception model, a first global local feature corresponding to the first image to be detected and a second global local feature corresponding to the second image to be detected includes: encoding the input data of each sub-encoding network through each sub-encoding network to obtain a first global local feature extracted by the sub-encoding network from a first image to be detected and a second global local feature extracted from a second image to be detected; The input data of each other sub-coding network is a first global local feature and a second global local feature output by the last sub-coding network of the sub-coding network.
3. The method of claim 2, wherein each sub-coding network comprises a global feature coding network, a local feature coding network and a feature fusion network, wherein the encoding the input data of the sub-coding network through each sub-coding network to obtain the first global local feature extracted by the sub-coding network from the first image to be detected comprises: analyzing the input data of each sub-coding network through the global feature coding network of each sub-coding network to obtain a first global feature corresponding to the sub-coding network; analyzing the input data of the sub-coding network through the local feature coding network of the sub-coding network to obtain a first local feature corresponding to the sub-coding network; And carrying out fusion processing on the first global features and the first local features corresponding to the sub-coding network through the feature fusion network of the sub-coding network to obtain the first global local features extracted by the sub-coding network for the first image to be detected.
4. The method of claim 3, wherein the parsing the input data of each sub-coding network by the global feature coding network of the sub-coding network to obtain the first global feature corresponding to the sub-coding network comprises: And analyzing the input data of each sub-coding network based on the channel attention mechanism and the space attention mechanism through the global feature coding network of each sub-coding network to obtain a first global feature corresponding to the sub-coding network.
5. The method according to claim 1, wherein determining, by each sub-difference aware network, sub-difference features from input data corresponding to the sub-difference aware network comprises: And determining initial difference characteristics according to the first global local characteristics and the second global local characteristics extracted by the sub-coding network corresponding to the sub-difference sensing network through each sub-difference sensing network, and fusing the initial difference characteristics, the sub-difference characteristics determined by the last sub-difference sensing network of the sub-difference sensing network and the input data of the sub-coding network corresponding to the sub-difference sensing network to obtain the sub-difference characteristics determined by the sub-difference sensing network.
6. The method of claim 1, wherein the signature decoding network comprises at least two sub-decoding networks connected end-to-end; Decoding the difference perception feature through a feature decoding network of the difference perception model to obtain a target difference region between the first image to be detected and the second image to be detected, wherein the method comprises the following steps: Decoding the input data of the sub decoding network through each sub decoding network to obtain a sub difference area determined by the sub decoding network; taking the sub-difference area determined by the last sub-decoding network as a target difference area between the first image to be detected and the second image to be detected; The input data of the first sub-decoding network is the difference perception characteristic, and the input data of each other sub-decoding network is the sub-difference area determined by the last sub-decoding network of the sub-decoding network.
7. The method according to any one of claims 1-6, wherein the training process of the differential perception model comprises: inputting a sample image pair into a difference perception model to obtain a sample difference region and a difference perception characteristic which are predicted by the difference perception model and correspond to the sample image, wherein the sample image pair comprises sample images acquired from the same region at different moments; determining a cross entropy loss value according to the real difference area corresponding to the sample image and the sample difference area; determining a contrast loss value according to the difference perception feature and the difference feature label corresponding to the sample image; And training the difference perception model according to the cross entropy loss value and the contrast loss value.
8. An image difference detection apparatus, characterized in that the apparatus comprises: The feature extraction module is used for extracting a first global local feature corresponding to a first image to be detected and a second global local feature corresponding to a second image to be detected through a feature coding network of the difference perception model, wherein the first image to be detected and the second image to be detected are images acquired from the same area at different moments; the difference perception module is used for carrying out difference perception processing on the first global local feature and the second global local feature and determining difference perception features between the first image to be detected and the second image to be detected; The difference determining module is used for decoding the difference perception characteristics through a characteristic decoding network of the difference perception model to obtain a target difference region between the first image to be detected and the second image to be detected; the feature coding network comprises at least two sub coding networks connected end to end and a difference sensing network, wherein the difference sensing network comprises at least two sub difference sensing networks connected end to end, and each sub difference sensing network corresponds to each sub coding network in the feature coding network one by one; the difference perception module comprises: The first determining unit is used for determining sub-difference characteristics through each sub-difference sensing network according to input data corresponding to the sub-difference sensing network; the fusion unit is used for fusing the sub-difference feature determined by the last sub-difference sensing network with the first global local feature and the second global local feature extracted by the sub-coding network corresponding to the last sub-difference sensing network to obtain a difference sensing feature between the first image to be detected and the second image to be detected; The input data of each other sub-difference sensing network is the input data of the sub-coding network corresponding to the sub-difference sensing network, the first global local feature and the second global local feature extracted by the sub-coding network corresponding to the sub-difference sensing network, and the sub-difference feature determined by the last sub-difference sensing network of the sub-difference sensing network.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

Description

Image difference detection method, apparatus, device, storage medium, and program product Technical Field The present application relates to the field of artificial intelligence, and in particular, to an image difference detection method, apparatus, device, storage medium, and program product. Background Image difference detection (e.g., difference detection of remote sensing images) aims to detect changes in the features of the region over time by analyzing and detecting images of the same region acquired at different times. The difference detection of the images has very important significance and utilization value in natural disaster early warning, urban planning, engineering progress investigation and military operations. However, in the current image difference detection method, when the difference detection is performed on the images, the pixel differences between different images to be detected are usually directly analyzed. There are problems that the image features are not fully utilized, so that the image difference detection result is inaccurate, the detection precision is not high, and the like, and the problems need to be solved. Disclosure of Invention In view of the foregoing, it is desirable to provide an image difference detection method, apparatus, device, storage medium, and program product that can improve the accuracy of image difference detection. In a first aspect, the present application provides an image difference detection method. The method comprises the following steps: extracting a first global local feature corresponding to a first image to be detected and a second global local feature corresponding to a second image to be detected through a feature coding network of a difference perception model, wherein the first image to be detected and the second image to be detected are images acquired from the same area at different moments; performing difference perception processing on the first global local feature and the second global local feature, and determining difference perception features between the first image to be detected and the second image to be detected; And decoding the difference perception features through a feature decoding network of the difference perception model to obtain a target difference region between the first image to be detected and the second image to be detected. In one embodiment, the feature encoding network comprises at least two sub-encoding networks connected end to end, and extracting, by the feature encoding network of the difference perception model, a first global local feature corresponding to a first image to be detected and a second global local feature corresponding to a second image to be detected, comprises: encoding the input data of each sub-encoding network through each sub-encoding network to obtain a first global local feature extracted by the sub-encoding network from a first image to be detected and a second global local feature extracted from a second image to be detected; the input data of each other sub-coding network is a first global local feature and a second global local feature output by the last sub-coding network of the sub-coding network. In one embodiment, each sub-coding network comprises a global feature coding network, a local feature coding network and a feature fusion network, wherein the coding processing is carried out on the input data of the sub-coding network through each sub-coding network to obtain a first global local feature extracted by the sub-coding network for a first image to be detected, and the method comprises the following steps: analyzing the input data of each sub-coding network through the global feature coding network of each sub-coding network to obtain a first global feature corresponding to the sub-coding network; analyzing the input data of the sub-coding network through the local feature coding network of the sub-coding network to obtain a first local feature corresponding to the sub-coding network; And carrying out fusion processing on the first global features and the first local features corresponding to the sub-coding network through the feature fusion network of the sub-coding network to obtain the first global local features extracted by the sub-coding network for the first image to be detected. In one embodiment, the parsing, by the global feature encoding network of each sub-encoding network, the input data of the sub-encoding network to obtain the first global feature corresponding to the sub-encoding network includes: And analyzing the input data of each sub-coding network based on the channel attention mechanism and the space attention mechanism through the global feature coding network of each sub-coding network to obtain a first global feature corresponding to the sub-coding network. In one embodiment, the difference perception model further comprises a difference perception network, wherein the difference perception network comprises at least two sub-difference perceptio