CN-121998826-A - Depth map super-resolution method based on blind degradation

CN121998826ACN 121998826 ACN121998826 ACN 121998826ACN-121998826-A

Abstract

The invention discloses a blind degradation-based depth map super-resolution method which comprises the steps of forming an initial RGB-D sample pair by a high-resolution RGB image and a high-resolution synthesized depth map corresponding to the high-resolution RGB image, screening out a high-quality RGB-D sample pair from the initial RGB-D sample pair, converting the high-quality RGB-D sample pair into a low-resolution training sample by inputting the low-resolution training sample into a learning degradation network, constructing an RGB-D training data set by the low-resolution training sample and a high-resolution RGB image corresponding to the low-resolution training sample, training a multi-mode super-resolution reconstruction network by using the RGB-D training data set to obtain a depth map super-resolution model, inputting the low-resolution depth map and the high-resolution RGB image corresponding to the low-resolution depth map into the depth map super-resolution model to output the high-resolution depth map.

Inventors

XIE SHIPENG
ZHANG YIFAN

Assignees

南京邮电大学

Dates

Publication Date: 20260508
Application Date: 20260129

Claims (5)

1. A depth map super-resolution method based on blind degradation, comprising the steps of: Inputting DEPTHANYTHING V a high-resolution RGB image into a DEPTHANYTHING V model to generate a high-resolution synthesized depth map, and forming an initial RGB-D sample pair by the high-resolution RGB image and the corresponding high-resolution synthesized depth map; Screening out high-quality RGB-D sample pairs from the initial RGB-D sample pairs based on depth map internal consistency, depth edge definition and structural alignment depth values of the depth map and the RGB image; Converting the high-quality RGB-D sample pair into a low-resolution training sample in a learning degradation network, and constructing an RGB-D training data set by the low-resolution training sample and the corresponding high-resolution RGB image; Training a multi-mode super-resolution reconstruction network by using the RGB-D training data set to obtain a depth map super-resolution model; and inputting the low-resolution depth map and the corresponding high-resolution RGB image into the depth map super-resolution model to output the high-resolution depth map.
2. The blind degradation-based depth map super resolution method according to claim 1, wherein said step of screening out high quality RGB-D sample pairs from said initial RGB-D sample pairs comprises: Calculating and obtaining a comprehensive quality score of the initial RGB-D sample pair based on the depth map internal consistency, the depth edge definition and the depth value of the structural alignment degree of the depth map and the RGB image; And based on a preset threshold, reserving the initial RGB-D sample pair with the comprehensive quality score higher than the preset threshold as the high-quality RGB-D sample pair.
3. The blind degradation-based depth map super-resolution method according to claim 1, wherein the multi-modal super-resolution reconstruction network comprises a degradation-oriented progressive feature enhancement module and a perceptual collaborative diffusion iteration module; For the RGB-D training data set, the degradation-oriented progressive feature enhancement module extracts the modal features of the low-resolution training sample and the corresponding high-resolution RGB image in a layering manner through an independent convolution feature extraction network; The degradation guide progressive characteristic enhancement module generates adaptive cross-modal fusion weights through Mamba models according to the two modal characteristics and degradation characterization output by the leachable degradation network; Based on the self-adaptive cross-modal fusion weight, the degradation-oriented progressive feature enhancement module performs inter-modal feature fusion on the two modal features to obtain fusion features; And the degradation-oriented progressive feature enhancement module generates a high-resolution depth map through iterative optimization under the guidance of the fusion feature and the degradation characterization.
4. The blind degradation-based depth map super-resolution method according to claim 3, wherein after the degradation-oriented progressive feature enhancement module performs inter-modal feature fusion on the two modal features to obtain a fusion feature, the degradation-oriented progressive feature enhancement module extracts a shared edge of the two modal features through a refining mechanism, and refines and outputs the fusion feature layer by layer through residual connection to serve as constraint information during iterative optimization of the degradation-oriented progressive feature enhancement module.
5. The blind degradation-based depth map super-resolution method according to claim 4, wherein the perceptual collaborative diffusion iteration module performs iterative optimization through an iteration formula to generate a high-resolution depth map, and the iteration formula is: ; Wherein, the Is the first A high resolution depth map at the time of step iteration, Is the initial estimate of the low resolution depth map obtained by bicubic upsampling, In order to constrain the terms, In order to be an anisotropic diffusion term, In the case of a data-fidelity item, In order to conditional the regularization term, For the purpose of degradation characterization, In the case of a low-resolution depth map, In order for the downsampling operation to be performed, Is a gradient operator.

Description

Depth map super-resolution method based on blind degradation Technical Field The invention relates to the technical field of depth map processing, in particular to a depth map super-resolution method based on blind degradation. Background The depth map is used as a key data carrier for representing the three-dimensional geometric structure of a scene, and becomes a core element for intelligent upgrading in a plurality of fields such as driving robots, automatic driving, augmented reality, three-dimensional reconstruction and the like. In the field of robots, depth maps provide accurate object distance and shape perception for robotic arms, which are the basis for achieving smart gripping and autonomous operation. In an automatic driving system, the depth map directly provides accurate distance information of the obstacle, and is the basis of path planning and decision security. In the consumer electronics field, the portrait mode and the AR special effect of the smart phone depend on a real-time depth map to realize accurate foreground and background separation and virtual-real fusion. In three-dimensional reconstruction and virtual reality, the depth map is a data foundation which is essential for generating a fine three-dimensional model and constructing immersive experience. It can be said that the depth map constitutes a "geometric skeleton" connecting the digital world with the physical three-dimensional space. However, current acquisition of high precision depth maps still faces hardware bottlenecks. Although the acquisition of high resolution, high fidelity RGB color images has been relatively mature with the development of imaging technology, mainstream depth sensors are limited by their physical principles, power consumption, cost and optical limitations, and the raw depth maps that are directly acquired typically have inherent drawbacks of low spatial resolution, significant noise, voids and blurred edges. This results in a serious contradiction between the demand for high resolution, high precision depth data for increasingly sophisticated downstream intelligent applications and the raw low quality depth data that can be provided by the sensor hardware. The depth map super-resolution technology is the key technical direction developed for solving the core contradiction. The super-resolution technology of the depth map is characterized in that the super-resolution technology of the depth map is used for recovering the depth map with higher spatial resolution, higher geometric accuracy and clearer edge details from one or more low-resolution noise-containing depth maps by a method of computational imaging and algorithm reconstruction, so that high-quality depth data meeting requirements is provided for upper-layer application. However, the existing depth map super-resolution method relies on fixed degradation assumptions, ignoring complex degradation distributions of real sensors and compression transmissions, resulting in insufficient generalization capability, edge blurring and artifact increase in actual scenes. Disclosure of Invention In order to overcome the defects existing in the prior art, a depth map super-resolution method based on blind degradation is provided, so that the problems that the existing depth map super-resolution method depends on fixed degradation assumption, ignores complex degradation distribution of a real sensor and compression transmission, and is insufficient in generalization capability, blurred edges and increased in artifacts in an actual scene are solved. In order to achieve the above object, a depth map super-resolution method based on blind degradation is provided, which includes the following steps: Inputting DEPTHANYTHING V a high-resolution RGB image into a DEPTHANYTHING V model to generate a high-resolution synthesized depth map, and forming an initial RGB-D sample pair by the high-resolution RGB image and the corresponding high-resolution synthesized depth map; Screening out high-quality RGB-D sample pairs from the initial RGB-D sample pairs based on depth map internal consistency, depth edge definition and structural alignment depth values of the depth map and the RGB image; Converting the high-quality RGB-D sample pair into a low-resolution training sample in a learning degradation network, and constructing an RGB-D training data set by the low-resolution training sample and the corresponding high-resolution RGB image; Training a multi-mode super-resolution reconstruction network by using the RGB-D training data set to obtain a depth map super-resolution model; and inputting the low-resolution depth map and the corresponding high-resolution RGB image into the depth map super-resolution model to output the high-resolution depth map. Further, the step of screening out high quality RGB-D sample pairs from the initial RGB-D sample pairs includes: Calculating and obtaining a comprehensive quality score of the initial RGB-D sample pair based on the depth map in