CN-122015844-A - Navigation agent framework construction method based on voxels

CN122015844ACN 122015844 ACN122015844 ACN 122015844ACN-122015844-A

Abstract

The invention relates to a method for constructing a navigation agent framework based on voxels, which comprises the steps of firstly constructing a path planning framework based on a live event, capturing multi-dimensional situation information facing the live environment, rapidly collecting environment changes, fusing massive heterogeneous data of a multi-element multi-dimensional sensor, constructing a panoramic semantic map and an updating mechanism facing maneuvering navigation, and realizing a path planning analysis and decision algorithm. The navigation method solves the problem that the navigation technology faces the three-dimensional space of the scene, particularly the expression difficulty in the aspects of precision and integrity under the condition of large scale and complex scenes, the navigation method provided by the invention does not depend on fixed road network data, can be used in any complex scene, can identify tiny obstacles, can calculate the optimal global maneuver route according to the maneuver performance of the vehicle and performs temporary guiding according to the live information.

Inventors

CHEN JING
XU XINPENG
XU DAOZHU
XU YING
WANG GUANGMIN
HUANG ZIJUN
XU WENYUAN
SUI XIN

Assignees

华东计算技术研究所（中国电子科技集团公司第三十二研究所）

Dates

Publication Date: 20260512
Application Date: 20260113

Claims (4)

1. A method for constructing a voxel-based navigation agent framework, comprising: Capturing multi-dimensional situation information for a live environment, rapidly collecting environment changes, fusing massive heterogeneous data of a multi-element multi-dimensional sensor, constructing a panoramic semantic map and an updating mechanism for maneuver navigation, and realizing a path planning analysis and decision algorithm; Step 1.1, reading static map and dynamic perception data, namely taking the characteristics of space refinement, semanteme, relevance and dynamic property of a complex urban environment into consideration, carrying out accurate digital construction on regional environments, building facilities and static entities, and dynamically receiving the latest environment change condition; Step 1.2, establishing a refined environment characterization model, namely constructing a structured three-dimensional space logic model integrating static and dynamic elements, geometry, topology and semantic information, wherein the structured three-dimensional space logic model comprises entity element data, encoding a global map into a multi-level three-dimensional map embedded with the semantic and geometry information based on a real-time semantic segmentation model of deep learning and an incremental voxel updating algorithm, encoding the position and the shape of important obstacle elements aiming at the geometry constraint information required by navigation, and encoding the appearance and the material of part of important entities; Step 1.3, full space geometry and semantic trafficability analysis and maneuver path planning, wherein the space geographic entity is discretized into a space occupation grid which comprises intervals occupied by the obstacle and intervals not occupied by the obstacle; calculating a movable region by using an updating and editing method of a voxel grid, further cutting the movable region according to vehicle movement parameters, generating navigation region data, merging the navigation region data by adopting a grid calculating method, wherein the navigation region data comprises a communication relation between the movable region and the movable region; The method comprises the steps of constructing a path navigation agent based on a path planning algorithm frame, driving to complete a complete navigation link closed loop, realizing regional environment information acquisition and key element dynamic change rapid updating based on a space grid through various environment sensing means, completing active sensing, fusion map construction and increment updating through unified map space environment characterization, realizing high-precision real-time modeling and self-adaptive maintenance of a complex city environment, realizing target semantic classification and multi-view data fusion based on a navigation semantic map taking geometric precision and semantic information into consideration, integrating deep learning feature extraction, integrating three-dimensional semantic map refined updating, global maneuver path dynamic prediction and on-machine guide path increment reasoning, realizing path planning and navigation algorithm, realizing intelligent navigation support from environment sensing, semantic understanding, decision planning to action execution, and finally finishing intention understanding and multi-mode expression output by combining a general large language model, and completing a navigation guiding process.
2. The method for constructing the navigation agent frame based on the voxels is characterized in that in the step 1.1, the collected heterogeneous data comprises laser point cloud, oblique photography and vector data, due to irregular format, uneven density and uneven scale, the heterogeneous data is projected into a three-dimensional voxel map by adopting a voxel filter, specifically, the collected data is uniformly distributed by adopting a voxel filter of n meters and discretized in the ground plane and the height direction, the collected data is divided into uniform space three-dimensional grids, so that light and stable structural features which are easy to calculate and store are formed, global descriptors are extracted to reserve semantic information of the structural features, the voxel map reserves basic space structural information, geometric features can be accurately extracted from complex three-dimensional scenes, the storage requirement can be reduced, and the subsequent calculation is facilitated.
3. The voxel-based navigation agent framework construction method of claim 1, wherein in step 1.2, in terms of the structure of a multi-level three-dimensional map, a 3D map is represented as M, M consisting of 1 static base S and a plurality of dynamic entities Representation of each dynamic entity Containing some 3D voxels And a semantic description Establishing a multi-layer map representation method comprising a static layer, a dynamic layer, a semantic layer and a topology layer; the semantic layer output adopts a JSON structured scene graph format, supports tasks including visual presentation and interaction and path planning loss calculation input, and the scene graph is described as follows; { "objects": [ {"id": 1, "name": "building", "bbox": [...], "type", "cost": ...}, {"id": 2, "name": "bicycle", "bbox": [...], "type", "cost": ...} ], "relations": [ {"subject": 1, "predicate": "right of", "object": 2: ...}, {"subject": 1, "predicate": "left of", "object": 3: ...} ] } The objects are dynamic entities, types are categories, cost is used for comprehensively evaluating trafficability and loss, follow-up navigation is used for calculation, relations is from a final set after predicate prediction and consistency constraint, and through the design, the map can meet vision, knowledge and structural expression requirements at the same time, and high-quality input is provided for follow-up path planning.
4. The method for constructing a voxel-based navigation agent framework of claim 1, wherein in step 1.3, the maneuver path planning comprises inputting vehicle parameters, comprehensively evaluating dynamic and static factors including maneuver trafficability, traffic conditions and weather conditions, and automatically generating an optimal action path by using the topological relation calculation of the space entity tag and the adjacent cells in the navigation area grid, specifically as follows: Adopting global planning and local planning combined incremental mode calculation to adapt to the dynamic change of the environment; IF (first time planning OR environment change) Executing global path planning patterns ELSE IF (local obstacle change OR Path local optimization) Performing local delta optimization mode END IF In a global path planning mode, an improved A algorithm is adopted, namely directional constraint is introduced, unnecessary steering is avoided, obstacles and traffic risks of all nodes in the environment are estimated, in a local increment optimization mode, local re-planning based on RRT is adopted, a current path is taken as a reference, local path segments are quickly reconstructed, and connection points are smoothly processed.

Description

Navigation agent framework construction method based on voxels Technical Field The invention relates to an information system software technology, in particular to a navigation intelligent body framework construction method based on voxels. Background The composition of the complex urban environment not only comprises the interior and exterior of roads, roofs or buildings, but also comprises basements, sewer, subways, tunnels and the like, and the urban terrain is not only dense, but also diversified. Aiming at intelligent navigation of urban environments, the requirements of accurate digital and visual interaction and navigation of geographic environments, building facilities, entities and the like are provided, the urban situation awareness capability needs to be improved, and the ubiquitous awareness data is utilized to carry out environment construction, positioning navigation and route planning on task areas. Automatic reconstruction of spatial intelligence using machine learning or deep learning techniques is a critical and challenging problem in computer vision. By capturing static configuration and dynamic changes over time, spatial intelligence should be able to provide a comprehensive representation and understanding of the spatial environment, combining three-dimensional geometry with its temporal evolution. In addition to these applications, which focus primarily on the spatial intelligence primitives (i.e., low-level cues such as depth, camera pose, point cloud and 3D tracking, as well as scene constituent elements and dynamics), particular emphasis is placed on interactions between scene components and the physical rationality of the reconstructed environment. It is necessary to construct a holographic base map to transform the static model into a computable, analyzable, predictable intelligent space. Becomes the intelligent center of decision making. At present, the traditional navigation and path planning technology can calculate a proper route according to the road network data and the starting point, the ending point, the avoiding point and the traffic mode appointed by a user, but the road network navigation is seriously dependent on roads, and cannot utilize the urban complex environment. While the automatic driving technology relies on vehicle body sensors (visual cameras, laser radars and the like) to solve the problem of local dynamic planning and long-distance global planning still needs road network navigation. Navigation technology faces the difficulty of three-dimensional space expression of a scene under large-scale and complex scenes, especially in terms of precision and integrity, map representation models aiming at navigation traffic calculation are lacking in navigation task facing, and space data structure redundancy, updating difficulty and difficulty in adapting to dynamic environments are overcome. The invention solves the technical problems in the following aspects: (1) The space scale is large under a large scene, the environment is diversified, the semantic complexity is high, and the consumption of computing and storing resources is obviously increased from the traditional planar map to the three-dimensional space. (2) The multi-mode data semantic mapping method relies on fusion of various sensor inputs (such as images, videos, depths and the like), and a multi-layer heterogeneous map fusion representation method needs to be constructed. (3) Incremental navigation planning methods based on live input are lacking to accommodate dynamic environments. Disclosure of Invention Aiming at the problems that the navigation technology faces the three-dimensional space of the scene, particularly the expression is difficult in the aspects of precision and integrity under the condition of large scale and complex scenes, a navigation agent frame construction method based on voxels is provided. The navigation provided by the invention does not depend on fixed road network data, can be used in any complex scene, can identify tiny obstacles, can calculate the optimal global maneuver route according to the maneuver performance of the vehicle and can conduct on-the-fly guiding according to the live information. The method can solve the following problems: (1) The problems of boundary blurring and semantic confusion caused by inconsistent appearance and semantics are solved, and further, the space understanding and interaction experience of users are improved. The method combines geometric reconstruction and semantic modeling to realize multi-level fusion three-dimensional map representation. (2) The method develops a high-efficiency multi-mode fusion strategy for map construction, which can support reliable and flexible inquiry, and solves the problem that the semantic map is made to face noise and uncertainty due to the noise of a sensor and the dynamic change of the environment. (3) And constructing a voxelized processing flow based on the space occupation grid, providing a tile dynam