CN-116468770-B - Self-supervision depth estimation method in three-dimensional reconstruction of mine potential safety hazard scene

CN116468770BCN 116468770 BCN116468770 BCN 116468770BCN-116468770-B

Abstract

The invention discloses a self-supervision depth estimation method in three-dimensional reconstruction of a mine potential safety hazard scene, which comprises the steps of firstly respectively constructing a depth estimation network and a gesture estimation network model of a normal illumination image and a low illumination image, adopting a position sensing module of a self-attention mechanism in the middle of a coder decoder to acquire context information and better characteristic representation of a scene structure, training the normal illumination image and the low illumination image obtained through CycleGAN processing in the process of training the network, and processing the image output by CycleGAN by adopting a Mapping Image Enhancement (MIE) algorithm to meet the requirement of keeping brightness consistency, and solving the influence caused by low illumination and illumination non-uniformity. The feature representation at the detail is enhanced, enhancing the depth estimation effect at the complex background. The added mapping image enhancement module obviously improves the brightness and contrast of the low-illumination image, thereby bringing higher visibility to the low-illumination image and reserving more details.

Inventors

KOU QIQI
MA XIANG
JI GUANGKAI
LI LONG
ZHENG LIJUAN
ZHANG HUIMIN
XU SHUAI
CHENG DEQIANG
WANG ZIQIANG
ZHANG HUAQIANG
CHEN JUNHUI
WANG YI
Zhao Linao
CHENG ZHIWEI

Assignees

中国矿业大学

Dates

Publication Date: 20260508
Application Date: 20230315

Claims (5)

1. The self-supervision depth estimation method in three-dimensional reconstruction of mine potential safety hazard scene is characterized in that firstly, a depth estimation network and a posture estimation network model of a normal illumination image and a low illumination image are respectively constructed, wherein the depth estimation network and the posture estimation network both adopt encoder-decoder structures, and the processing method comprises the following steps: S1, normal illumination image Night image converted to low light by CycleGAN ; S2, inputting the generated low-illumination image into an MIE module for processing, wherein the MIE module is used for processing by using a brightness mapping function The realization is as follows: ; Wherein, the Is a mono-mapping function that maps input luminance to a single specific output, assuming the frequency distribution of the input image First preset frequency parameters Clipping frequencies greater than a preset parameter to avoid amplification of noise signals, then uniformly filling the clipped frequencies to each brightness level, and finally obtaining by the following formula : ; Wherein the method comprises the steps of And Respectively representing the minimum value and the maximum value of the cdf, and L represents the number of brightness levels; S3, the normal illumination image Into encoder of depth estimation network, and outputting to obtain characteristic diagram I=1, 2,3, 4, 5, will be S2 processed low light image Into encoder of depth estimation network, and outputting to obtain characteristic diagram ,i = 1、2、3、4、5; S4, feature map with lowest resolution 、 Inputting the information to a position sensing module; S5, after passing through the position sensing module, the signals are input into a decoder, and feature images are output respectively 、 Corresponding depth map 、 ; In the step S5, the depth of the normal illumination image is used as a pseudo tag to restrict the depth of the low illumination image, and the similarity loss is defined as: Wherein the method comprises the steps of N is And X means the x-th pixel; s6, adjacent frame images of the normal illumination image and the low illumination image 、 And 、 Inputting the information into a pose estimation network to calculate six-degree-of-freedom relative pose information Depth map obtained by combining depth estimation network 、 Constructing a reconstructed view of the original view 、。
2. The method for estimating the self-supervision depth in the three-dimensional reconstruction of the mine safety potential scene according to claim 1, wherein the encoder part in S3 adopts Resnet as a backbone network, and removes Resnet final average pooling layer and full connection layer, which are respectively a maximum pooling layer, layer2, layer3, layer4 and layer5.
3. The method for estimating the depth of self-supervision in three-dimensional reconstruction of a mine safety potential scene according to claim 1, wherein the location awareness module strengthens the feature of the query location by aggregating the location features of other locations, and the module is expressed as: Wherein, the Measuring the influence of the jth position on the ith position, wherein N represents the total number of pixel points; key, query and Value are respectively represented and obtained by A linear transformation.
4. The method for estimating the self-supervision depth in the three-dimensional reconstruction of the mine safety hazard scene as claimed in claim 1, wherein in the S6 reconstruction view, Wherein K is a camera reference matrix, P is the homogeneous coordinates of the pixel; Is p is passed through The coordinates after the transformation are used for the transformation, Is a micro bilinear sampler for acquiring In (a) At the pixel and at A linear interpolation pixel is formulated at p.
5. The method for estimating self-supervising depth in three-dimensional reconstruction of a mine safety hazard scene according to claim 4, wherein the reconstruction view is constrained by using a structural similarity index in combination with an L1 loss as a luminosity loss L ph , and a loss function is expressed as: Wherein the method comprises the steps of Take 0.75.

Description

Self-supervision depth estimation method in three-dimensional reconstruction of mine potential safety hazard scene Technical Field The invention relates to the technical field of three-dimensional reconstruction, in particular to a self-supervision depth estimation method in three-dimensional reconstruction of a mine potential safety hazard scene. According to the method, the depth measurement of the mine potential safety hazard scene and the three-dimensional scene reconstruction are realized, the simulation deduction is realized, and the function of predicting the mine potential safety hazard scene is realized through the obtained virtual three-dimensional scene. Background Along with the proposal of concepts such as intelligent mines, how to carry out three-dimensional reconstruction on mine potential safety hazard scenes, realize simulation deduction and provide powerful technical support for the safety production of the mines, and the method is a problem to be solved urgently. The depth estimation of the mine safety hazard scene is an important component for realizing three-dimensional reconstruction. Depth estimation has found widespread use in augmented reality, unmanned and robotic applications, and early use of depth sensors (LiDAR and DOF) to achieve depth estimation, which require high cost and continuous operation in mines, has limited use. The self-supervision monocular depth estimation can be used for predicting the depth of pixels in a single image without the equipment, and meanwhile, due to the fact that available ground real depth data of images of mine hidden danger scenes are limited, an unsupervised learning method without accurate ground truth data is adopted to be more fit. Therefore, unsupervised monocular depth estimation in the face of mine hazards has received extensive attention from researchers. Existing self-supervised monocular depth estimation methods typically use geometric constraints on stereoscopic image pairs or monocular sequences as supervision and have made great progress. Eigen, D published Depth map prediction from A SINGLE IMAGE usinga multi-SCALE DEEP network, describes related technical content. However, most of the current self-supervision monocular depth estimation mainly solves the problem of depth estimation of daytime images, mine images are often obtained under the conditions of low light and complex environment, and the depth estimation of the mine images is extremely unstable due to the influence of low visibility and uneven light. CycleGAN depth estimation of low-light images by converting low-light information into daytime information in good light conditions at the image level and feature level, but CycleGAN networks using low light as input have difficulty in obtaining natural daytime images or features and therefore have limited performance. Monodepth2 is an effective self-supervising monocular depth estimation method, and the CycleGAN processed image is directly input into monodepth, and depth details cannot be estimated due to the complex environment and uneven illumination. The prior art has the disadvantage that, first, monodepth's 2 depth estimation network is based on the U-Net framework, and the decoder only uses concatenation and one basic convolution to fuse the high and low level features. These operations do not preserve enough detail or accurately recover spatial information, resulting in depth features that are not efficiently represented in complex environments. Secondly, images with good illumination conditions are still adopted during training, low-illumination images are directly processed by CycleGAN during testing and then are used as input, and image characteristics under natural good illumination conditions are difficult to obtain. Disclosure of Invention In view of the above drawbacks of the prior art, the corresponding solution is as follows: First, a self-attention mechanism location awareness module is employed in the encoder decoder to obtain context information and better characterization of scene structures. Second, in the process of training the network, the normal illumination image and the low illumination image obtained through CycleGAN processing are used for training. And then the image output by CycleGAN is processed by adopting a Mapping Image Enhancement (MIE) algorithm to meet the requirement of keeping brightness consistency and solve the influence caused by low illumination and uneven illumination. A self-supervision depth estimation method in three-dimensional reconstruction of mine potential safety hazard scene constructs a model, namely a depth estimation network and a posture estimation network of a normal illumination image and a low illumination image respectively, The method comprises the following specific steps: S1, normal illumination image Night image converted to low light by CycleGAN S2, inputting the generated low-light image into an MIE module for processing; s3, the normal illumination image Int