CN-121767864-B - Visual large model and graphic neural network fusion road network extraction and topology optimization method based on rising environment

CN121767864BCN 121767864 BCN121767864 BCN 121767864BCN-121767864-B

Abstract

The invention discloses a lifting environment-based visual large model and graphic neural network fusion road network extraction and topology optimization method, which comprises the steps of processing input optical image data and LiDAR data by using a trained lifting environment-based road network extraction network, outputting a pixel-level road probability map, wherein the lifting environment-based road network extraction network comprises a double-stream encoder, a cross-attention fusion module and a feature decoder which are sequentially connected; and carrying out width estimation and road network optimization on the output pixel-level road probability map to obtain a road vector network with correct topology and bandwidth attribute. The invention realizes intelligent optimization of road network connectivity by constructing a road network extraction network based on a rising environment and adopting a road width estimation method based on the combination of a distance field and maximum pixel expansion, and fully utilizes the geometric features and topological structures of roads, thereby improving the accessibility and robustness of the whole network.

Inventors

YANG QINGQING
CHENG HAN
XU XIAOMIN
QIAO BINGBING
ZHANG JIAHUI
ZHU LILU
ZHANG ZHE

Assignees

中科星图金能(南京)科技有限公司
苏州空天信息研究院

Dates

Publication Date: 20260508
Application Date: 20260226

Claims (7)

1. A method for extracting and topologically optimizing a road network by combining a visual large model and a graphic neural network based on a lifting environment is characterized by comprising the following steps: processing the input optical image data and LiDAR data by using a trained lifting environment-based road network extraction network, and outputting a pixel-level road probability map, wherein the pixel-level road probability map comprises a road network extraction segmentation result SegMask, a binarization road map and a central line vector and topological relation table; Performing width estimation and road network optimization on the output pixel-level road probability map to obtain a road vector network with correct topology and bandwidth attribute; the double-stream encoder comprises an optical encoder and a LiDAR encoder which are arranged in parallel, wherein the input of the optical encoder is preprocessed optical image data, and the output of the optical encoder is optical characteristics; The input of the cross-attention fusion module is an optical characteristic and a LiDAR characteristic, and the structure comprises a spectrum attention branch of a channel dimension and a self-adaptive attention branch of a space dimension, and the cross-attention fusion module is used for carrying out fusion processing on the input characteristic and outputting a fusion characteristic; the input of the feature decoder is a fusion feature, and the output is a pixel-level road probability map; the calculation process of the cross-attention fusion module comprises the following steps: In-mode self-attention, respectively carrying out self-attention calculation on the optical characteristic and the LiDAR characteristic; Cross attention among modes, namely interacting a Query of one mode with a Key-Value of the other mode through a Query-Key-Value mechanism to realize information complementation, wherein the Key and the Value in the LiDAR feature processing are used for actively inquiring structural information in the LiDAR feature aiming at the Query in the optical feature processing and the Key and the Value in the optical feature processing, and the texture color information in the optical feature is actively inquired aiming at the LiDAR feature; The gating fusion unit is used for introducing a learnable gating weight, dynamically adjusting the contribution degree of the two modes at each characteristic position, splicing the self-attention and the cross-attention outputs, and controlling the information flow through the gating weight vector G.
2. The method for extracting and topologically optimizing a road network based on a fusion of a visual large model and a neural network in a lifting environment according to claim 1, wherein the preprocessing process of the optical image data and the LiDAR data comprises the following steps: Performing radiation correction and geometric registration on the optical image data to obtain a processed RGB image, performing difference on LiDAR data to generate a digital surface model DSM, and aligning the processed RGB image and the DSM to the same space coordinate system to obtain preprocessed optical image data and LiDAR data.
3. The method for extracting and topologically optimizing a visual large model and a neural network fusion road network based on a lifting environment as claimed in claim 1, wherein the calculation formula of the gating fusion unit comprises the following steps: , Wherein, the As a function of the gating of the optical features, For the LiDAR feature gating function, In order to gate the bias function, In order to gate the fusion function, In the form of a linear transformation matrix, As a function of the Sigmoid, For the element-by-element multiplication, Inter-modal cross-attention for LiDAR features to optical features, For inter-modal cross-attention of optical features to LiDAR features, Is optical characteristic Is a function of the self-concentration of the (c) in the water, Is LiDAR characteristic Is to be added to the system.
4. The method for extracting and topologically optimizing a road network based on a fusion of a visual large model and a neural network of a lifting environment according to claim 1, wherein the loss function of the road network based on the lifting environment comprises pixel segmentation loss, boundary loss, center axis/center line loss, width regression loss and topology perception loss, and the calculation formula of the total loss function is as follows: , Wherein, the Extracting the total loss function of the network for the road network based on the rising environment, 、 The pixel segmentation loss quench parameter and the pixel segmentation loss, 、 Boundary loss quench parameters and boundary losses, 、 The central axis loss super parameter and the central axis loss are respectively, 、 The width regression loss superparameter and the width regression loss, 、 The method comprises the steps of respectively obtaining topology perception loss super-parameters and topology perception loss, wherein each super-parameter is used for adjusting weights of different loss tasks.
5. The method for merging the visual large model and the neural network into the road network extraction and the topology optimization based on the lifting environment according to claim 4, wherein the calculation formula of the topology perception loss comprises the following steps: , Wherein, the Representing the centerline mentioned from the prediction mask, Is the true center line, N is the total number of pixels, Is a smooth term.
6. The method for extracting and topologically optimizing the road network by combining the visual large model and the graphic neural network based on the lifting environment as set forth in claim 1, wherein the method for estimating the width and optimizing the road network of the outputted pixel-level road probability map comprises the following steps: Preprocessing the pixel-level road probability map to obtain a preprocessed pixel-level road probability map; calculating a discrete distance value from each pixel to the nearest background boundary for the preprocessed pixel-level road probability map to obtain a pixel-level distance field; Extracting a center line candidate set through skeletonization by combining a binary mask in the preprocessed pixel-level road probability map with the pixel-level distance field, and generating a center line confidence coefficient and edge confidence coefficient map at the same time; carrying out width estimation on each road in the pixel-level road probability map by combining the central line candidate set to obtain a road width sequence of all roads; The method comprises the steps of combining a road width sequence and a preprocessed pixel-level road probability map, obtaining pixel width level distribution through a maximum pixel expansion mode, mapping the pixel width level distribution into initial width estimation of edges, constructing a topological map, carrying out road segment level width serialization and robust aggregation in the topological map, converting the topological map into a line map, realizing geometrical and semantic inconsistency correction and global connectivity enhancement, and carrying out information transmission in the line map through a map neural network to obtain a road width optimization result.
7. The method for extracting and topologically optimizing a road network by combining a visual large model and a neural network based on a lifting environment as claimed in claim 6, wherein the width estimation process comprises the steps of firstly obtaining a preliminary width of a road by utilizing geometric estimation, obtaining a semantic width predicted value by a network prediction fusion method, and obtaining a final road width by a self-adaptive weighted fusion method according to the preliminary width and the semantic width predicted value.

Description

Visual large model and graphic neural network fusion road network extraction and topology optimization method based on rising environment Technical Field The invention relates to the technical field of road network extraction, in particular to a road network extraction and topology optimization method based on a visual large model and a graphic neural network fusion of a lifting environment. Background Despite the great progress made in road extraction technology over the last decades, existing solutions still present significant limitations and bottlenecks in processing high resolution telemetry data and converting it into a usable vector road network. The present invention provides a systematic solution for pain points in a rising environment. 1. Defect analysis of the prior art: a) The problem of topology consistency deficiency and road breakage, The existing deep learning road extraction method mostly regards it as a pure pixel-level classification task (semantic segmentation). While models perform well on pixel indices (e.g., mIoU), they tend to be unsatisfactory at a geometric topology level. Because roads are in an elongated strip shape and are extremely susceptible to blockage by trees, building shadows, or vehicles, model-generated predictive masks often exhibit minor breaks, voids, or false branches. Due to the lack of global topological constraint, the incoherence of the pixel layer can lead the generated road network to be incapable of carrying out effective path navigation and connectivity analysis in the subsequent vectorization process, thereby greatly limiting the application of the road network in the high-precision fields such as automatic driving and the like. B) A shortage of cross-modal characteristic processing capability, The existing multi-mode fusion method usually adopts a simple early fusion mode or a late fusion mode, wherein the early fusion mode refers to splicing RGB and DSM at an input end, and the late fusion mode refers to weighted average of results. These two extensive fusion approaches fail to fully exploit deep nonlinear interactions between spectral textures and geometric elevations. In complex urban scenarios, the phenomenon of "similar spectra and different semantics" in optical images, such as black asphalt roofs and asphalt pavements, and the region of "insignificant geometrical features" in LiDAR data, such as rural roads in plain areas, still lead to serious missed and false judgments. The prior art lacks a refined fusion mechanism capable of adaptively switching modal weights under different environments. C) Computing overhead and efficiency bottlenecks for large model deployment, With the advent of large visual models, such as Foundation Models based on ViT, model parameters have grown explosively. While large models bring about extremely strong generalization capability, they also bring about significant computational and memory overhead. When the traditional general software framework processes such a very large scale model, depth optimization cannot be performed aiming at specific computing hardware (such as a lifting NPU), so that throughput is low and delay is obvious when reasoning is performed on a large scale city scale image. This is not acceptable in engineering scenarios where near real-time processing or large scale batch processing is required (e.g. disaster relief). D) Width and geometric deviation in vectorization, Even if the segmentation mask accuracy is high, it remains a technical challenge to convert it into a standardized vector road network with width properties. The traditional method adopts morphology skeletonization followed by heuristic rules, the method is very unstable for estimating the road width, is easily influenced by edge burrs, and cannot ensure the continuity and smoothness of the width in a long-distance road section. Meanwhile, the traditional vectorization flow lacks of joint optimization of the whole road network structure, so that geometric dislocation of the generated vector line segments often occurs at the intersection. Disclosure of Invention Aiming at the defects in the prior art, the invention discloses a visual large model and graph neural network fusion road network extraction and topology optimization method based on a lifting environment, which is characterized in that a road network extraction network based on the lifting environment is constructed, and a road width estimation method based on the combination of a distance field and maximum pixel expansion is adopted, so that the geometric characteristics and the topological structure of a road are fully utilized, the intelligent optimization of road network connectivity is realized, and the overall network accessibility and robustness are improved. The technical scheme is that the following technical scheme is adopted to achieve the technical purpose. A visual large model and graphic neural network fusion road network extraction and topology optimization metho