CN-122023534-A - Sparse image-based three-dimensional asset checking method and device for traffic infrastructure

CN122023534ACN 122023534 ACN122023534 ACN 122023534ACN-122023534-A

Abstract

The invention discloses a traffic infrastructure three-dimensional asset checking method and device based on sparse images. The method comprises the steps of obtaining panoramic images and corresponding pose data, extracting visual semantic features and geometric ray features of each observation target, associating the cross-view observation targets to generate target observation chains, carrying out high-precision geometric calculation on the target observation chains to obtain three-dimensional center coordinates and three-dimensional physical dimensions of the observation targets, carrying out uncertainty assessment to obtain uncertainty scores, carrying out grading response according to the uncertainty scores, selecting a frame of image with optimal visual quality in the target observation chains, generating a normalized three-dimensional grid model based on the image, carrying out space transformation on the normalized three-dimensional grid model based on the three-dimensional center coordinates and the three-dimensional physical dimensions to obtain high-fidelity digital twin entities, constructing multi-dimensional state fingerprints for the high-fidelity digital twin entities, and carrying out full-life-cycle operation and maintenance monitoring. The method can construct the high-fidelity entity three-dimensional model under the sparse view angle.

Inventors

DONG ZHEN
LIU LI
JIA YANG
Fu Luxuan

Assignees

武汉大学
四川省公路规划勘察设计研究院有限公司

Dates

Publication Date: 20260512
Application Date: 20260414

Claims (10)

1. A traffic infrastructure three-dimensional asset inventory method based on sparse images, the method comprising: acquiring panoramic images and corresponding pose data, extracting a two-dimensional bounding box and a semantic mask of the panoramic images, and extracting visual semantic features and geometric ray features of each observation target; constructing a multi-mode cascade association diagram based on physical gating, associating cross-view observation targets, and generating a target observation chain; Performing high-precision geometric calculation on the target observation chain to obtain a three-dimensional center coordinate and a three-dimensional physical size of an observation target, and performing uncertainty evaluation to obtain an uncertainty score; Performing a hierarchical response according to the uncertainty score; selecting a frame of image with optimal visual quality in a target observation chain, generating a normalized three-dimensional grid model based on the image, and performing space transformation on the normalized three-dimensional grid model based on the three-dimensional center coordinates and the three-dimensional physical dimensions to obtain a high-fidelity digital twin entity; and constructing a multidimensional state fingerprint for the high-fidelity digital twin entity, and carrying out full life cycle operation and maintenance monitoring based on the multidimensional state fingerprint.
2. The sparse image based traffic infrastructure three-dimensional asset inventory method of claim 1, wherein constructing a multi-modal cascade association graph based on physical gating, associating view-angle-crossing observation targets, generating a target observation chain, comprises: calculating visual similarity between observation pairs based on the visual semantic features, calculating ray subsurface distances between observation pairs based on the geometric ray features, and calculating comprehensive affinity scores of the observation pairs based on the visual similarity and the ray subsurface distances to generate candidate matching pairs; And carrying out physical consistency verification on the candidate matching pairs, reversely calculating the assumed physical size of the observation target by using sparse ray geometry, judging the candidate matching pairs with the assumed physical size difference exceeding a preset consistency tolerance threshold as mismatching and forcibly disconnecting association, so as to generate a target observation chain subjected to physical strong verification.
3. The sparse image based traffic infrastructure three-dimensional asset inventory method of claim 2, wherein the comprehensive affinity score is calculated as: Wherein, the In order to integrate the affinity score(s), As cosine similarity function of visual semantic feature vector, A normalized scoring function for the different-plane distance of two rays, Respectively are observation targets And Is characterized by the fact that, Respectively are observation targets And Is used for the different-plane distance of the rays, For the self-adaptive weight factor, the calculation formula is as follows: Wherein, the The function is activated for Sigmoid, In order to detect the area of the frame, Is a preset reference area threshold.
4. The sparse image-based three-dimensional asset inventory method of traffic infrastructure of claim 1, wherein performing high-precision geometric calculation on the target observation chain to obtain three-dimensional center coordinates and three-dimensional physical dimensions of an observation target, and performing uncertainty evaluation to obtain an uncertainty score, comprises: Constructing a geometric energy function based on the target observation chain, minimizing the geometric energy function, and calculating to obtain an optimal three-dimensional center coordinate and a corresponding three-dimensional physical size; an uncertainty score for each observed target is calculated based on the geometric topology.
5. The sparse image based traffic infrastructure three-dimensional asset inventory method of claim 4, wherein the geometric energy function is: Wherein, the As a function of the geometric energy of the device, Representation points To the first Strip observation ray Is arranged in the vertical distance of (a), Is a robust kernel function; The calculation formula of the uncertainty score is as follows: Wherein, the As a score for the uncertainty, For the maximum value of the included angle of any two rays in the observation chain, N is the number of observation frames, For the optimized average re-projection residual, In order to normalize the weight coefficients, And Is a normalization parameter that controls the decay rate of the function.
6. The sparse image based traffic infrastructure three-dimensional asset inventory method of claim 1, wherein performing a hierarchical response in accordance with the uncertainty score comprises: when the uncertainty score is larger than a first preset threshold, judging a high-risk blind area, reversely calculating an optimal complement viewpoint capable of maximizing a ray intersection angle according to the three-dimensional center coordinates of an observation target, and generating an acquisition guide instruction for triggering the fixed-point encryption complement of the current time or the subsequent time; When the uncertainty score is smaller than or equal to a first preset threshold value and larger than a second preset threshold value, judging a suspected target, projecting an observation target back to an original image, intercepting a group of multi-view ROI (region of interest) slices, and generating an artificial verification work order; And when the uncertainty score is smaller than or equal to a second preset threshold value, determining that the uncertainty score is a high-confidence target.
7. The sparse image based three-dimensional asset inventory method of traffic infrastructure of claim 1, wherein the formula for performing the spatial transformation The method comprises the following steps: wherein the scaling factor Wherein To normalize a three-dimensional mesh model Is defined by the height of the bounding box, The matrix is rotated for the true physical height in the three-dimensional physical dimensions Determining yaw angle construction according to the included angle between the main direction of the observation ray and the north direction of the world coordinate system, leading the front of the model to face the road, and translating the vector , Is a three-dimensional center coordinate.
8. The sparse image based traffic infrastructure three-dimensional asset inventory method of claim 1, wherein the multi-dimensional state fingerprint is: Wherein, the In the case of a multi-dimensional state fingerprint, Respectively representing absolute coordinates, physical height, average visual characteristic vector and neighborhood topological relation.
9. A traffic infrastructure three-dimensional asset inventory device based on sparse images, comprising: The multi-mode feature extraction module is used for acquiring panoramic images and corresponding pose data, extracting a two-dimensional bounding box and a semantic mask of the panoramic images, and extracting visual semantic features and geometric ray features of each observation target; The multi-mode association module is used for constructing a multi-mode cascade association diagram based on physical gating, associating the cross-view observation targets and generating a target observation chain; the geometric calculation and evaluation module is used for performing high-precision geometric calculation on the target observation chain to obtain the three-dimensional center coordinates and the three-dimensional physical dimensions of the observation target, and performing uncertainty evaluation to obtain an uncertainty score; A closed loop feedback control module for performing a hierarchical response based on the uncertainty score; The model anchoring and degree quantizing module is used for selecting a frame of image with optimal visual quality in a target observation chain, generating a normalized three-dimensional grid model based on the image, and carrying out space transformation on the normalized three-dimensional grid model based on the three-dimensional center coordinates and the three-dimensional physical dimensions to obtain a high-fidelity digital twin entity; And the intelligent operation and maintenance monitoring module is used for constructing a multidimensional state fingerprint for the high-fidelity digital twin entity and carrying out full life cycle operation and maintenance monitoring based on the multidimensional state fingerprint.
10. A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor to perform the sparse image based traffic infrastructure three dimensional asset inventory method of any of claims 1 to 8.

Description

Sparse image-based three-dimensional asset checking method and device for traffic infrastructure Technical Field The invention relates to the technical field of computer vision, in particular to a traffic infrastructure three-dimensional asset checking method and device based on sparse images, a storage medium and electronic equipment. Background With the rapid development of smart city and digital twin technology, building of a three-dimensional database of city infrastructure (such as street lamps, traffic signs, well covers, etc.) with high precision, rich semantics and full elements has become a foundation for fine management of cities. Currently, automatic checking and reconstruction based on sequence street view images acquired by an on-board Mobile Measurement System (MMS) is a mainstream means. However, in the face of mass data storage and transmission pressures, engineering practices often employ a large-pitch sparse acquisition mode (e.g., taking a frame every 10-20 meters). This sparse observation condition presents a serious challenge to the existing technology system, mainly in the following four aspects: (1) Strong constraints of lack of physical dimensions across view-angle associations are prone to topology errors Existing multi-view target tracking and matching algorithms typically rely on visual features (e.g., reID, CLIP features) or simple geometric distances (e.g., ioU, euclidean distances) for weighted fusion. However, in urban road scenarios there are a large number of "visually similar and spatially adjacent" interfering targets (e.g. identical street lights arranged in succession, or a plurality of traffic signs in close proximity on a lamp pole). The mismatch of the 'Zhang guan Li' is very easy to occur only by the visual similarity and the geometric distance. Existing fusion strategies often lack a consistency checking mechanism for intrinsic physical properties (e.g., true height, width) of a target, resulting in an algorithm that is not robust in the face of objects of widely differing but visually similar dimensions (e.g., garden lights and high pole lights). (2) It is difficult to construct a high-fidelity materialized three-dimensional model under sparse viewing angles Traditional multi-view stereovision (MVS) or neuroradiation field (NeRF) reconstruction algorithms rely heavily on dense images with high overlap rates. Under sparse streetscape, the methods cannot match the feature points due to overlarge parallax, so that the reconstruction result is full of holes or completely collapses. On the other hand, in the single-view three-dimensional generation model (such as various 3D AIGC algorithms) which is recently emerging, although a realistic three-dimensional grid can be generated from a single image, the output of the model is usually located in a normalized virtual coordinate system, and lacks real physical meter scale and absolute world geographic coordinates. The current industry technology does not form an effective pipeline, and can organically combine the fidelity of the generated AI with the precision of the photogrammetry, so that the generated model can only be used for display and cannot be used for engineering measurement and space analysis. (3) Acquisition-processing link fracture, lack of closed loop feedback mechanism based on quality assessment Current infrastructure screening typically employs an open loop mode of operation of "acquisition-first-second-processing". The acquisition vehicle blindly runs according to the fixed track, and the data quality cannot be known in real time. When the following processing links find that the geometric solution uncertainty of some key targets is too high due to shielding, illumination or poor view angle, the following processing links are usually too late, and only low-precision results can be accepted or the vehicle complement is rescheduled at a huge cost. The prior art lacks a closed loop feedback mechanism for evaluating the availability of data in real time based on geometric topology quality (such as ray intersection angle and re-projection residual error) and automatically guiding manual verification or triggering fixed-point compensation acquisition. (4) Lack of full life cycle state monitoring means based on multidimensional fingerprints Existing facility operation and maintenance mainly relies on manual inspection or simple image Change Detection (Change Detection). The pure image comparison is easy to be interfered by illumination, seasons and shooting angle changes, and a large number of false positives are generated. And the two-dimensional image is difficult to quantify the structural state change of the sensing facility (such as 5-degree inclination of the lamp post, 20 cm drop of the height of the guideboard, etc.). The prior art lacks a multi-dimensional state fingerprint system which integrates accurate geometric dimension, visual semantic features and relative space topology, and is difficult to real