CN-122023699-A - Star table large-range detection-oriented layered inquired three-dimensional scene graph construction method

CN122023699ACN 122023699 ACN122023699 ACN 122023699ACN-122023699-A

Abstract

A method for constructing a layered inquired three-dimensional scene graph for star-table large-range detection belongs to the field of star-detection robot environment sensing and graph construction. The method aims to solve the problem that the navigation map and the scientific target perception map are mutually independent in the existing environment characterization method. The method comprises the steps of generating an initial three-dimensional point cloud of rock based on a rock mask, downsampling to be sparse voxel representation, constructing an elevation map of a local area around a robot according to depth point cloud in real time, generating a binary trafficability grid map based on real-time perception data, trying to connect with existing nodes by combining the rock and an impact pit in the moving process of the robot to construct an incremental topology map, then constructing an instance layer for storing instance information, a topology layer for trafficable road network information, a clustering layer for clustering adjacent scientific targets, a target center layer for storing the central position of a scientific target or a target gathering area, and a planet detection robot path layer for providing context for path planning and analysis.

Inventors

ZHOU RUYI
Tang Lisijin
DING LIANG
GAO HAIBO
YANG HUAIGUANG
DENG ZONGQUAN

Assignees

哈尔滨工业大学

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (10)

1. The method for constructing the layered inquired three-dimensional scene graph for the star table large-scale detection is characterized by comprising the following steps of: S1, rock depth semantic representation based on a visual-language model: The method comprises the steps of obtaining a 2D rock mask of a rock in a color image, namely a 2D bounding box and a pixel level mask of the rock, back projecting the 2D pixel level mask of the rock to a three-dimensional space, generating an initial three-dimensional point cloud of the rock, obtaining a central position of the rock, and storing the central position as abstract information of a rock instance; Performing cross-frame association fusion on the same rock observed from different perspectives, and updating the geometric and semantic properties of the rock; selecting image features of multiple views for each rock instance as an image snapshot, inputting a visual-language big model, generating a structured natural language description based on a predefined geological attribute dimension, and structuring and storing geological attribute information of each aspect according to a specified answer paradigm to obtain structured geological attributes; S2, star table impact pit real-time detection and parameter extraction based on geometric features: Obtaining depth point clouds and constructing an elevation map of a local area around the robot in real time, extracting a group of contour lines from the elevation map, determining collision pit contours based on the contour lines, screening candidate collision pit contours through geometric criteria, performing circle fitting on each candidate contour, estimating center coordinates and radius, performing cluster merging on nested contours from the same collision pit, judging whether rock is embedded in the collision pit or not as collision pit semantic attribute by utilizing the identified rock position information mentioned in S1, and registering a new detection collision pit which fails to perform association fusion by performing association fusion on center distance and radius with an existing collision pit instance for the new detection collision pit; S3, constructing an incremental topological map for exploration: In the moving process of the robot, continuously sampling new nodes in the passable area nearby the robot, after the new nodes are successfully sampled, trying to connect with the existing nodes, generating an initial graph structure based on Delaunay triangulation, organizing edges in the graph by using an R-tree space index to support efficient query, and storing each edge into the R-tree by taking a space bounding box thereof as an index item; s4, organizing and constructing a hierarchical scene graph: An instance layer, which is used as a core layer and stores instance information corresponding to all scientific targets including rocks and impact pits, wherein the instance information comprises unique IDs, three-dimensional positions, geometric parameters, sparse voxel representation and structural semantic description; The topology layer is used as a core layer for storing the road network information which can pass through in the environment, namely storing the global undirected and non-crossed topology map constructed in the step S3, and representing the topology layer by a graph structure, wherein the topology layer only comprises points and edges; clustering layer, based on the spatial distribution of the examples, clustering the adjacent scientific targets to form a target gathering area, so as to realize the division and recording of the integral rock distribution; the target center layer is used for abstractly storing the center position of each scientific target or target gathering area, so that quick retrieval and task scheduling are facilitated; And a planet detection robot path layer for recording the historical path of the robot and providing a context for path planning and analysis.
2. The method for constructing the hierarchical queriable three-dimensional scene graph for star-table large-scale detection according to claim 1, wherein in the step S1, in the process of acquiring the 2D rock mask of the rock in the image, the 2D rock mask of the rock is obtained by detecting and segmenting the model according to the RGB image.
3. The method for constructing a hierarchical queriable three-dimensional scene graph for star-table wide-range detection according to claim 2, wherein in the step S1, in the process of back-projecting the rock 2D pixel-level mask into the three-dimensional space, the rock 2D pixel-level mask is back-projected into the three-dimensional space in combination with a depth information source acquired synchronously with the RGB image.
4. The method for constructing a hierarchical queriable three-dimensional scene graph for star-oriented extensive detection according to claim 1, wherein the step S1 of performing cross-frame correlation on the same rock observed from different perspectives comprises: If the point clouds are not overlapped, further calculating the convex hull property of the point clouds, namely the convex hull volume corresponding to the point cloud of the rock A identified by the current frame is V_A, the convex hull volume corresponding to the point cloud of the rock B identified by the historical frame is V_B, the convex hull volume corresponding to the point cloud of the AB which is obtained by merging is V_ { AB }, calculating the convex hull proportion t= (V_A+V_B)/V_ { AB }, if 0< t < 1), correlating the AB, and otherwise, judging that the rock identified by the current frame is a new rock instance.
5. The method for constructing a hierarchical queriable three-dimensional scene graph for star-list wide-range detection according to claim 1, wherein the depth point cloud acquired based on the accumulated dense depth image or the depth point cloud acquired by the accumulated laser radar in step S2 constructs an elevation map of a local area around the robot in real time.
6. The method for constructing a hierarchical queriable three-dimensional scene graph for star-table wide-range detection according to claim 1, wherein the step S2 of determining the impact pit contour based on the contour lines and screening candidate impact pit contours by geometric criteria comprises: firstly judging whether the collision pit contour presents a central concave characteristic, then calculating the measurement measures of roundness, rectangle degree, aspect ratio and the like which are commonly used for measuring the shape of a closed curve, and finally reserving candidate contours meeting all shape thresholds.
7. The method for constructing the hierarchical queriable three-dimensional scene graph for the star list large-scale detection according to claim 1, wherein in the step S3, when the new node is continuously sampled in a passable area nearby the new node in the moving process of the robot, the new node is characterized in that the first node is located in a position without any obstacle, the second node is located at a distance from the nearest node to the nearest node which is greater than a second distance threshold value, dense sampling is avoided, and the third node is located in a local area with the center of the robot, the total number of the nodes cannot exceed the saturated number, and if the saturated number is reached, sampling is stopped.
8. The method for constructing the hierarchical queriable three-dimensional scene graph for star-table wide-range detection according to claim 7, wherein in the step S3, in the process of attempting to connect a new node with an existing node, the connection needs to satisfy a distance constraint, and a path is accessible in the whole course and does not intersect with the existing edge.
9. The method for constructing a hierarchical queriable three-dimensional scene graph for star-oriented extensive exploration according to claim 8, wherein in the process of constructing the incremental topology map for exploration in step S3, adjustment of the topology map is required according to an instance layer and a passable area, comprising: deleting nodes falling into the obstacle region and all edges connected with the nodes, deleting edges intersected with the region, and resampling the nodes in the updated passable region to maintain the connectivity of the graph.
10. The method for constructing a hierarchical queriable three-dimensional scene graph for star-table wide-range detection according to any one of claims 1 to 9, wherein the method further comprises a construction process of a query module, and specifically comprises: Coding the requirement for searching the rock, aiming at the coding corresponding to the image feature and the coding required for searching the rock, using the CLIP model to align and match the identified rock and the rock feature to be searched in a high-dimensional vector space, so that the identified rock and the rock feature to be searched are matched in the vector space, judging according to the coded vector, and considering that the similarity of the identified rock and the rock feature to be searched is higher if the dot product of the direction vector of the identified rock and the rock feature to be searched is close to 1, so as to determine the searching target; Aiming at the problem to be queried, introducing a large language model for comprehensive reasoning, and disassembling to obtain corresponding geological attributes; Based on an instance aggregation area formed by clustering adjacent scientific targets by a clustering layer, the description of instance aggregation is directly input during query, and then the instance aggregation area is corresponding to the instance aggregation area.

Description

Star table large-range detection-oriented layered inquired three-dimensional scene graph construction method Technical Field The invention belongs to the field of environment sensing and map building of star detection robots, and relates to a three-dimensional scene map system and method suitable for a wheel type star detection robot, a foot type star detection robot and the like. Background Extensive autonomous scientific exploration and resource exploration of the surface of the planet (e.g., early survey of lunar base construction) requires that the robotic system possess long-term, accurate environmental understanding and modeling capabilities for unstructured, semantically sparse, but example-intensive star-surface environments. The understanding not only includes the identification of a single scientific instance, but also requires the formation of comprehensive knowledge and records of spatial structure, target attributes and traffic capacity to support autonomous navigation and scientific task decisions. The prior art is developed around the core task of star robot environment understanding and map construction, and the development context of the technology reflects the evolution trend from geometric modeling to semantic perception to task fusion: early work on star map environmental understanding focused on accurate modeling at the geometric level. In order to ensure safe movement of the robot, high-precision digital terrain models (DEM) and orthophotos are built for Mars detection tasks such as 'wary' and 'opportunity' and 'Jade Rabbit No. two' lunar rover in China. The map provides precise geometric information of the environment, is a foundation for realizing path planning and positioning, lacks semantic understanding of objects in the environment, cannot distinguish different semantic units such as rocks and sand, and cannot read scientific values. To improve the semantic hierarchy of environmental understanding, research began to integrate object recognition and terrain classification techniques. The vision-based rock, meteorite crater detection technique is a key. Early methods relied on manual features, and then deep learning models achieved significant improvements in detection and segmentation accuracy. However, such methods typically output a separate detection box or mask, with the information staying at the level of "what object is" and not being depth-blended with the map structure. Meanwhile, the detection of the collision pit mainly depends on the offline analysis of the track remote sensing data, and is difficult to be suitable for local and incomplete real-time observation under the view angle of the star car. In recent years, although the visual-language model shows open vocabulary description potential, the accuracy of professional description in the field of geological verticality is insufficient, and it is difficult to generate structured attribute information for scientific analysis. As task complexity increases, environmental understanding further evolves toward task-driven multi-modal, structured characterization. Researchers have attempted to combine semantic perception results with maps to build a map form that serves a particular target. For example, the NASA risk aware map distinguishes between "sharp rock" and "round rock" by online terrain classification and fuses it into a geometric map to plan a safe path. Furthermore, a probability model is introduced into the scientific hypothesis map, and the priori beliefs of scientists are combined with the observations of the robot to optimize the exploration path for acquiring the scientific information. These tasks mark the transition of environmental understanding from "perceiving objects" to "servicing tasks". However, the prior art has the following obvious defects that the requirements of the star table on a large-scale and long-period scientific detection task are difficult to meet: Semantic understanding is shallow. Existing object recognition methods (e.g., rock segmentation) lack depth fusion with semantic maps (e.g., risk maps). The identification result is mostly an isolated label, and an instance-level knowledge base comprising fine morphological attributes, multi-view descriptions and spatial relations cannot be formed. This results in an environmental understanding of what is left on the "rocky" level, and an inability to answer scientific questions such as "where the rock is with what geologic features". Characterizing structural cleavage. Geometric maps, semantic maps, scientific hypothesis maps are typically constructed and maintained as independent modules, lacking a unified data structure. The traffic topology information required by navigation planning and the target attribute information focused by scientific tasks are separated from each other, so that target retrieval and path planning are disjointed, and closed-loop tasks such as 'going to and re-observing specific scientific targets' cannot be efficien