CN-122023959-A - Non-supervision dense matching network training method, network training system and DSM generation method

CN122023959ACN 122023959 ACN122023959 ACN 122023959ACN-122023959-A

Abstract

The invention relates to the technical field of computer vision and deep learning, in particular to an unsupervised dense matching network training method, a network training system and a DSM generation method, wherein the training method comprises the steps of obtaining left and right image pairs of an area to be processed; the left image pair and the right image pair are input into an unsupervised dense matching network taking pixel-level structural similarity loss, namely PSSIM loss, as a loss function, the network is trained, the unsupervised dense matching network is constructed based on a PSMNetb network structure, wherein the calculation of the PSSIM loss function comprises the steps of upsampling two images to be compared, calculating the similarity by utilizing a structural similarity function, namely an SSIM function, and averaging the calculation result. According to the invention, pixel level structure similarity loss is introduced, so that the problem of pixel level matching interference caused by local window calculation of classical SSIM is effectively solved, and the precision and reliability of parallax image generation are obviously improved under the condition of no need of manual labeling.

Inventors

GUAN KAI
SHI BING
Tong Furong
Feng Hezhen
MU ZHIYUAN
YANG FAN
BAI WENHAO
YUAN HAIJUN
ZHAO ZIMING
LIU YAQI
HE JIANG
WANG JIANFENG
FANG QING

Assignees

中国人民解放军61363部队

Dates

Publication Date: 20260512
Application Date: 20251229

Claims (10)

1. An unsupervised dense matching network training method, which is characterized by comprising the following steps: Acquiring a left image pair and a right image pair of a region to be processed; inputting left and right image pairs into an unsupervised dense matching network taking pixel-level structure similarity loss, namely PSSIM loss, as a loss function, training the network, wherein the unsupervised dense matching network is constructed based on a PSMNetb network structure; The calculation of PSSIM loss functions comprises the steps of up-sampling two images to be compared, calculating the similarity by using a structural similarity function (SSIM function), and averaging the calculation results.
2. The method for training an unsupervised dense matching network according to claim 1, wherein the training process of the unsupervised dense matching network specifically comprises: Inputting the left and right image pairs I L ,I R into an unsupervised dense matching network to obtain a left parallax image D L ; reconstructing based on the right image I R and the left disparity map D L to obtain a left reconstructed image by using a spatial transformation network Computing the original left image I L and the left reconstructed image PSSIM losses in between; Inputting the left and right image pairs after horizontal overturn into an unsupervised dense matching network to obtain a right parallax image D R ; Reconstructing based on the left image and the right parallax image D R after horizontal overturn to obtain a right reconstructed image by using a space transformation network And performing horizontal reverse overturning; Calculating an original right image I R and a right reconstructed image after reverse overturn PSSIM losses in between; and back-propagating and updating the parameters of the network according to the two PSSIM loss values.
3. The non-supervised dense matching network training method of claim 2, wherein the PSSIM loss is calculated by the formula: Wherein F avg represents an averaging function, F SSIM represents a structural similarity function, F up represents an upsampling function, and I 1 ,I 2 represents two images to be compared, here an original left image I L and a left reconstructed image Or original right image I R and right reconstructed image C piexl_ssim denotes a pixel level structural similarity loss value.
4. The method of claim 3, wherein the structural similarity function uses a3 x 3 convolution kernel to implement the calculation of local statistics instead of gaussian convolution kernels in conventional SSIM.
5. The non-supervised dense matching network training method of claim 4, wherein the structural similarity function F SSIM is defined as: Where I 1 ,I 2 is the two images of the input, c 1 and c 2 are constants, F m is the mean function, F c is the covariance function, and F s is the variance function.
6. The method for training the non-supervision dense matching network according to claim 1, wherein the PSMNetb network sequentially comprises a feature pyramid module, a cost body construction module and a three-dimensional convolution aggregation module, wherein the feature pyramid module is used for extracting multi-scale feature images from left and right images, the cost body construction module is used for constructing a multi-scale three-dimensional cost body by matching corresponding relations of the left and right feature images under a plurality of preset parallaxes based on the multi-scale feature images, and the three-dimensional convolution aggregation module is used for regularizing the three-dimensional cost body through stacked three-dimensional convolution layers and outputting a final parallax image through probability, parallax regression and up-sampling.
7. A method of DSM generation, comprising the steps of: Obtaining a trained non-supervision dense matching network model by adopting the non-supervision dense matching network training method of any one of claims 1 to 6; inputting left and right image pairs of the region to be tested into a trained network model to generate a parallax image; Converting the parallax map into distance information through a photogrammetry geometric model and internal and external azimuth elements of the sensor; Using the distance information, in combination with the sensor position and attitude, a digital surface model, DSM, is generated.
8. An unsupervised dense matching network training system for implementing the unsupervised dense matching network training method according to any of claims 1 to 6, the system comprising: The image acquisition module is used for acquiring left and right image pairs of the region to be processed; The network training module is used for inputting the left image pair and the right image pair into an unsupervised dense matching network taking pixel-level structural similarity loss as a loss function, training the network, and constructing the unsupervised dense matching network based on PSMNetb network structures, wherein the calculation of PSSIM loss function comprises the steps of up-sampling two images to be compared, calculating the similarity by utilizing the structural similarity function, and averaging the calculation result.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the training method according to any of claims 1 to 6.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the training method of any of claims 1 to 6 when the computer program is executed by the processor.

Description

Non-supervision dense matching network training method, network training system and DSM generation method Technical Field The invention relates to the technical field of computer vision and deep learning, in particular to an unsupervised dense matching network training method, a network training system and a DSM generation method. Background The digital surface model (Digital Surface Model, DSM) is a fundamental data of modern geospatial analysis, which has a key role in accurately describing the three-dimensional morphology of the earth's surface and above-ground objects. The system comprehensively reflects the elevation information of natural terrains and artificial ground features, and provides important spatial data support for multiple fields such as urban planning, environment monitoring, agricultural management, disaster assessment, wireless network planning, new energy development, aviation navigation and the like. The DSM is generated by combining the position and attitude parameters of the sensor and relying on distance information acquired from a remote sensing platform. At present, the main means for acquiring the distance information comprises two modes of densely matching the airborne laser radar and the image. Compared with a laser radar, the dense matching method based on the optical image has the advantages of low cost, easy data acquisition and the like, so that the application is wider. The core of the dense matching is that homonymous points are searched pixel by pixel in the overlapped images, a parallax image is constructed, parallax is converted into distance information through a photogrammetric geometric model, and DSM is finally generated. In order to reduce the computational complexity, epipolar line constraint is generally introduced to simplify the two-dimensional matching problem into one-dimensional search, and a specific process comprises epipolar line geometric calculation, searching homonymous points along the epipolar line, generating a parallax map, and converting the parallax map into a target three-dimensional product by utilizing internal and external azimuth elements of a sensor. The accuracy of dense matching directly determines the quality of the final DSM, which is a key link in the whole process. However, the existing dense matching method still has a plurality of defects in practical application, and the method is mainly characterized by the following three aspects: (1) The automation level is limited, and the mapping requirement of high precision, high efficiency and high speed is difficult to adapt. The existing operation flow still depends on manual intervention, and particularly in a scene with higher precision requirement, a large amount of manual collection and editing are still needed, so that the efficiency is low and the cost is high; (2) The traditional method mostly adopts manually designed feature descriptors, relies on expertise and has weak generalization capability. Because the difference of texture features among different image data sets is obvious, the fixed descriptors are difficult to adapt to diversified scenes, so that the matching robustness is insufficient; (3) In challenging areas such as repeated textures, weak textures, occlusion, illumination change and the like, the matching precision of the traditional method is obviously reduced, complex post-processing is often required, and errors can be introduced in the processing, so that the reliability of results is affected. In recent years, as the practical implementation of deep learning technology advances, data-driven dense matching methods exhibit significant advantages. The depth network can automatically learn the characteristic representation suitable for the task, reduces the dependence on the artificial design characteristic, improves the degree of automation, and enhances the robustness and the accuracy of the algorithm. The current dense matching network based on deep learning can be mainly divided into two types, namely a general encoder-decoder structure represented by DispNet, a network structure of the type uses U-Net design as a reference, the parameter scale is large, the structure is simple, the reasoning speed is high, the dense matching network is suitable for various pixel-level prediction tasks, but the specificity in the aspect of dense matching is weaker, and the other type is a special network represented by GCNet, PSMAT and GwcNet, the structure of the dense matching network is used for the traditional three-dimensional matching thought, a cost construction and three-dimensional convolution aggregation mechanism is introduced, and a parallax image is output through probabilistic parallax search and soft regression, so that the precision is remarkably higher than that of the general network, but the network structure is complex and the calculation resource consumption is high. Although deep learning has made remarkable progress in dense matching, the existing method