CN-121504771-B - Image restoration method based on multi-view 3D reconstruction and geometric attention

CN121504771BCN 121504771 BCN121504771 BCN 121504771BCN-121504771-B

Abstract

The application provides an image restoration method based on multi-view 3D reconstruction and geometric attention, which comprises the steps of obtaining a target image and a reference image, generating a two-dimensional mask of the target image, respectively inputting the target image and the reference image into a depth estimator to obtain a target depth image and a reference depth image, respectively corresponding camera internal parameters and external parameters and the target depth image and the reference depth image to obtain a target point cloud and a reference point cloud, selecting points with the same coordinates from the target point cloud and the reference point cloud to obtain a 3D point cloud set, then obtaining a weight matrix by using a Gaussian kernel function and the like based on the 3D point cloud set, respectively inputting non-missing areas of the target image and the reference image into a feature encoder to respectively obtain corresponding feature images, respectively using an attention mechanism based on the previous items to obtain a fusion feature tensor, performing feature stitching on the fusion feature tensor, the feature images and the like to obtain conditional tensor features, and further obtaining a restored target image by using a conditional diffusion model of a U-Net framework.

Inventors

LIU ZHUANG
WEN ZHIKE
Cao Yuzhao
CAI HUANQING
Liu Houxuan
SHAO GUIWEI
FU JING
TAN JIAYING
WU HUAXI
ZHOU LIWEI
ZHANG BO

Assignees

中国电力科学研究院有限公司武汉分院
中国电力科学研究院有限公司

Dates

Publication Date: 20260508
Application Date: 20260113

Claims (10)

1. An image restoration method based on multi-view 3D reconstruction and geometric attention, the method comprising: acquiring a target image and a reference image, and generating a two-dimensional mask of the target image, wherein the target image comprises a missing region and a non-missing region, the two-dimensional mask of the target image is used for distinguishing the missing region and the non-missing region of the target image, and the reference image comprises a clear image of the missing region of the target image; respectively inputting the target image and the reference image into a depth estimator to obtain a target depth map and a reference depth map; Based on camera internal parameters and external parameters respectively corresponding to the target image and the reference image, and the target depth map and the reference depth map, mapping pixels of the target image and the reference image to a three-dimensional world coordinate system to obtain a target point cloud and a reference point cloud; selecting points with the same coordinates from the target point cloud and the reference point cloud to obtain a 3D point cloud set containing N points, and calculating Euclidean distance between any two points in the 3D point cloud set to obtain The distance symmetric matrix is converted into a weight matrix by utilizing a Gaussian kernel function; inputting the target image, the non-missing region of the target image and the reference image into a feature encoder respectively to obtain a global feature map of the target image, a non-missing region feature map of the target image and a global feature map of the reference image respectively; based on the weight matrix, the global feature map of the target image and the global feature map of the reference image, obtaining a fusion feature tensor by using an attention mechanism; And performing feature stitching on the fusion feature tensor, the non-missing region feature map of the target image, the two-dimensional mask of the target image and a preset 3D coordinate range to obtain conditional tensor features, and obtaining the repaired target image by adopting a conditional diffusion model of a U-Net architecture based on the conditional tensor features.
2. The multi-view 3D reconstruction and geometric attention-based image restoration method according to claim 1, wherein the inputting the target image and the reference image into a depth estimator to obtain a target depth map and a reference depth map, respectively, comprises: normalizing pixel values of the target image and the reference image to a range required by a depth estimator to obtain a processed target image and a processed reference image; And respectively inputting the processed target image and the reference image into a depth estimator to obtain a target depth image and a reference depth image, wherein the target depth image comprises a depth value of each pixel in the target image, and the reference depth image comprises a depth value of each pixel in the reference image.
3. The multi-view 3D reconstruction and geometric attention-based image restoration method according to claim 2, wherein the mapping pixels of the target image and the reference image to a three-dimensional world coordinate system based on the camera internal parameters and external parameters respectively corresponding to the target image and the reference image, and the target depth map and the reference depth map, to obtain a target point cloud and a reference point cloud, includes: Calculating 3D coordinates of the target image under a target camera coordinate system based on the target camera internal parameters corresponding to the target image and the depth value of each pixel in the target depth map; Mapping pixels of the target image to a three-dimensional world coordinate system based on a target camera external parameter corresponding to the target image and a 3D coordinate of the target image under the target camera coordinate system to obtain a target point cloud, wherein the camera external parameter corresponding to the target image comprises a rotation matrix and a translation vector between the target camera coordinate system and the three-dimensional world coordinate system; And Calculating 3D coordinates of the reference image under a camera coordinate system based on the reference camera internal parameters corresponding to the reference image and the depth value of each pixel in the reference depth map; And mapping pixels of the reference image to a three-dimensional world coordinate system based on a reference camera external parameter corresponding to the reference image and a 3D coordinate of the reference image under the reference camera coordinate system to obtain a reference point cloud, wherein the camera external parameter corresponding to the reference image comprises a rotation matrix and a translation vector between the reference camera coordinate system and the three-dimensional world coordinate system.
4. The multi-view 3D reconstruction and geometric attention-based image restoration method according to claim 1, wherein the obtaining the fused feature tensor based on the weight matrix, the global feature map of the target image, and the global feature map of the reference image by using an attention mechanism includes: Obtaining the dimension of the reference image based on the global feature map of the reference image; Based on the weight matrix, the global feature map of the target image, the global feature map of the reference image and the dimension of the reference image, a fused feature tensor is obtained by using an attention mechanism formula; wherein, the attention mechanism formula is: , representing the fused feature tensor, Q representing a global feature map of the target image, K representing a global feature map of the reference image, G representing the weight matrix, Representing the dimensions of the reference image, Representing element-wise multiplication.
5. The multi-view 3D reconstruction and geometric attention-based image restoration method according to claim 4, wherein the feature stitching is performed on the fusion feature tensor, the non-missing region feature map of the target image, the two-dimensional mask of the target image, and a preset 3D coordinate range to obtain a conditional tensor feature, including: and performing feature stitching on the fusion feature tensor, the non-missing region feature map of the target image, the two-dimensional mask of the target image and a preset 3D coordinate range, wherein the specific formula is as follows: , Wherein the said Representing the conditional tensor feature, the A non-missing region feature map representing the target image, the Representing a preset 3D coordinate range, said A two-dimensional mask representing the target image.
6. The multi-view 3D reconstruction and geometric attention-based image restoration method according to claim 1, wherein the obtaining the restored target image by using a conditional diffusion model of a U-Net architecture based on the conditional tensor feature comprises: Step of adding noise, namely gradually adding Gaussian noise to the target image until the noise of a preset time step is added, so as to obtain a noise image of the target image; And denoising the noise image of the target image step by adopting a condition diffusion model of a U-Net architecture under the guidance of the condition tensor characteristics to obtain a repaired target image.
7. The multi-view 3D reconstruction and geometric attention-based image restoration method according to claim 1, wherein the method further comprises, after obtaining a restored target image, using a conditional diffusion model of a U-Net architecture based on the conditional tensor feature: inputting the repaired target image into a depth estimator to obtain a new target depth map; Mapping pixels of the repaired target image to a three-dimensional world coordinate system based on the new target depth map, the camera internal parameters and the camera external parameters corresponding to the target image, and obtaining a new target point cloud; Based on camera internal parameters and external parameters corresponding to the target image, projecting the new target point cloud back to a 2D pixel space to obtain a cyclic image; calculating pixel differences of the target image and the cyclic image in a non-missing area based on a two-dimensional mask of the target image, and calculating Euclidean distance between the target point cloud and the new target point cloud in the non-missing area; And based on the pixel difference between the target image and the cyclic image in the non-missing area and the Euclidean distance between the target point cloud and the new target point cloud in the non-missing area, jointly optimizing the repaired target image.
8. An image restoration system based on multi-view 3D reconstruction and geometric attention, the system comprising: The acquisition module is used for acquiring a target image and a reference image and generating a two-dimensional mask of the target image, wherein the target image comprises a missing area and a non-missing area, the two-dimensional mask of the target image is used for distinguishing the missing area and the non-missing area of the target image, and the reference image comprises a clear image of the missing area of the target image; the depth map generation module is used for respectively inputting the target image and the reference image into a depth estimator to obtain a target depth map and a reference depth map; The coordinate mapping module is used for mapping pixels of the target image and the reference image to a three-dimensional world coordinate system based on the camera internal parameters and external parameters respectively corresponding to the target image and the reference image, the target depth image and the reference depth image to obtain a target point cloud and a reference point cloud; the weight matrix calculation module is used for selecting points with the same coordinates from the target point cloud and the reference point cloud to obtain a 3D point cloud set containing N points, and then calculating Euclidean distance between any two points in the 3D point cloud set to obtain The distance symmetric matrix is converted into a weight matrix by utilizing a Gaussian kernel function; the feature map generation module is used for respectively inputting the target image, the non-missing region of the target image and the reference image into a feature encoder to respectively obtain a global feature map of the target image, a non-missing region feature map of the target image and a global feature map of the reference image; And the restoration module is used for obtaining a fusion feature tensor by using an attention mechanism based on the weight matrix, the global feature map of the target image and the global feature map of the reference image, further carrying out feature stitching on the fusion feature tensor, the non-missing region feature map of the target image, the two-dimensional mask of the target image and a preset 3D coordinate range to obtain a conditional tensor feature, and obtaining a restored target image by adopting a conditional diffusion model of a U-Net architecture based on the conditional tensor feature.
9. An image restoration device based on multi-view 3D reconstruction and geometric attention, characterized in that said device comprises a processor and a memory, said processor being adapted to execute instructions stored in said memory for implementing the method according to any of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium comprises a storage computer program or instructions which, when executed, cause the method of any of claims 1-7 to be performed.

Description

Image restoration method based on multi-view 3D reconstruction and geometric attention Technical Field The application relates to the field of overhead power line operation maintenance and detection overhaul, in particular to an image restoration method, an image restoration system, an image restoration device and a storage medium based on multi-view 3D reconstruction and geometric attention. Background In the field of tree barrier identification and measurement of intelligent operation and maintenance of an electric power system, related images (such as line-tree space relation diagrams, tree profile features and the like) of tree barriers, which are acquired by unmanned aerial vehicle inspection, satellite remote sensing telemetry or manual tower climbing shooting and the like, are frequently subjected to local image deletion or blurring due to the problems of staggered shielding of branches, unmanned aerial vehicle pitching/yawing visual angle deviation, sensor imaging noise and the like, and the accuracy of tree and line height measurement, clearance distance calculation and hidden danger level judgment is seriously affected. Meanwhile, the diversity of the tree obstacle samples brings very high requirements on the generalization capability of the detection model, for example, tree morphology differences of different seasons (such as defoliation period/flourishing period), different tree species (such as arbor/shrub) and different growth environments (such as plain/mountain land) are obvious, and the cost for acquiring the tree obstacle data with high quality and multiple views in the actual scene is high, the efficiency is low (professional staff is required to take on-site handheld tools to ascend and measure or to reconstruct and calculate the model after point cloud data are acquired through a laser radar). Early tree obstacle image restoration techniques rely mainly on 2D appearance feature matching, such as extracting local texture features of image blocks through SIFT/ORB feature extraction or Convolutional Neural Network (CNN), and searching for the neighborhood blocks with highest similarity in the reference map for filling. The method can partially recover the missing area under a simple tree barrier scene (such as a single tree species in an open area) with a single view angle and high texture repeatability, but has obvious defects in actual tree barrier measurement: Firstly, 2D feature matching only focuses on pixel-level appearance similarity, 3D geometric projection differences of trees at different view angles (such as tree height compression at overlook view angles and tree trunk inclination at side view angles) cannot be effectively modeled, and branch textures at different space positions but similar in appearance are easily mismatched, so that the repaired tree contour is misplaced with the real space position, and calculation accuracy and accuracy of line clearance distance are directly affected; Secondly, the 3D space structure of the tree is lack of explicit modeling, large-area deletion caused by large-scale shielding (such as shielding a contact area between a line and the tree by dense branches and leaves) is difficult to repair, the repair result often violates geometric constraints of the physical world (such as that the branches and trunk of the tree extend along the main stem and the distribution of the leaves accords with a growth rule), a complete image which accords with the shape of a real tree obstacle cannot be generated, and further the reliability of a hidden danger identification model of the tree obstacle is reduced. Thus, in view of the foregoing, there is a need for a method that can generate a complete image that conforms to a real tree obstacle scene based on an existing reference image. Disclosure of Invention The embodiment of the application provides an image restoration method based on multi-view 3D reconstruction and geometric attention, which solves the problem of branch and leaf shielding and visual angle deviation of an unmanned aerial vehicle inspection image, and further avoids the key point deletion and distance measurement distortion caused by branch and leaf shielding and visual angle deviation in the tree obstacle recognition and detection process. In order to achieve the above purpose, the application adopts the following technical scheme: In a first aspect, the application provides an image restoration method based on multi-view 3D reconstruction and geometric attention, comprising the steps of acquiring a target image and a reference image, and generating a two-dimensional mask of the target image, wherein the target image comprises a missing area and a non-missing area, the two-dimensional mask of the target image is used for distinguishing the missing area and the non-missing area of the target image, and the reference image comprises a clear image of the missing area of the target image; respectively inputting the target image and the refere