CN-121999131-A - Object-level semantic SLAM construction method and device based on NeRF

CN121999131ACN 121999131 ACN121999131 ACN 121999131ACN-121999131-A

Abstract

The invention discloses a NeRF-based object-level semantic SLAM construction method and device, wherein the method comprises the steps of obtaining image data and camera parameter data; and processing the image preprocessing data and the camera calibration parameter data by utilizing an object level semantic SLAM modeling model to obtain camera pose data, object scene modeling data and scene object semantic segmentation data. The object modeling method utilizes NeRF technology to carry out object modeling, solves the problem of poor object modeling precision of the traditional object-level semantic SLAM, utilizes the image segmentation large model SEEM to carry out object semantic recognition, and solves the problem that the traditional object-level semantic SLAM can not recognize objects beyond prior information.

Inventors

ZHAO XIUGUO
WANG ZIJIAN
SU CHEN
WANG ZHENBAO
LIU XIN
XU XINXI

Assignees

军事科学院系统工程研究院卫勤保障技术研究所

Dates

Publication Date: 20260508
Application Date: 20260107

Claims (10)

1. A NeRF-based object-level semantic SLAM construction method, the method comprising: s1, acquiring image data and camera parameter data; s2, preprocessing the image data and the camera parameter data to obtain image preprocessing data and camera calibration parameter data; and S3, processing the image preprocessing data and the camera calibration parameter data by using an object level semantic SLAM modeling model to obtain camera pose data, object scene modeling data and scene object semantic segmentation data.
2. The NeRF-based object-level semantic SLAM construction method of claim 1, wherein the object-level semantic SLAM modeling model comprises: the system comprises a data input module, a tracking modeling module, an object semantic recognition module and a data output module; The data input module is used for acquiring the image data and the camera parameter data; The tracking modeling module is used for carrying out ray sampling and camera tracking modeling on the image data to obtain the target scene image data and the camera tracking data; The object semantic recognition module is used for carrying out coding processing on the target scene image data to obtain object mask semantic data and semantic tag data; the data output module is used for processing the camera tracking data, the object mask semantic data and the semantic tag data to obtain the camera pose data, the object scene modeling data and the scene object semantic segmentation data.
3. The object-level semantic SLAM construction method based on NeRF of claim 2, wherein the processing the image pre-processing data and the initial camera pose data using an object-level semantic SLAM modeling model to obtain camera pose data, object scene modeling data, and scene object semantic segmentation data comprises: s31, tracking and mapping the image preprocessing data and the camera calibration parameter data by using an object-level semantic SLAM modeling model to obtain target scene image data and optimized camera pose data; S32, performing global beam adjustment method optimization processing on the target scene image data and the optimized camera pose data by using the object-level semantic SLAM modeling model to obtain joint optimized camera pose data and joint optimized scene representation parameters; S33, utilizing the object-level semantic SLAM modeling model, and carrying out object semantic identification processing on the target scene image data based on the joint optimization camera pose data and the joint optimization scene representation parameters to obtain target object mask semantic data and target semantic tag data; S34, processing the joint optimization camera pose data, the target object mask semantic data and the target semantic tag data to obtain camera pose data, object scene modeling data and scene object semantic segmentation data.
4. The object-level semantic SLAM construction method based on NeRF of claim 3, wherein the tracking and mapping the image preprocessing data and the camera calibration parameter data to obtain target scene image data and optimized camera pose data using an object-level semantic SLAM modeling model comprises: s311, performing camera pose tracking processing on the camera calibration parameter data to obtain optimized camera pose data; S312, carrying out joint coordinate and parameter coding processing on the image preprocessing data to obtain target scene image data.
5. The NeRF-based object-level semantic SLAM construction method of claim 4, wherein performing camera pose tracking processing on the camera calibration parameter data to obtain optimized camera pose data comprises: S3111, processing the camera calibration parameter data to obtain initial camera attitude data; s3112, processing the image preprocessing data and the initial camera pose data based on a first loss function to obtain tracking loss data; the tracking loss data comprises truncated symbol distance loss data, space loss data and characteristic smoothing loss data; S3113, processing the tracking loss data by using a total loss calculation model to obtain first total loss data; s3114, determining the pose data of the optimized camera based on the first total loss data.
6. The NeRF-based object-level semantic SLAM construction method of claim 5, wherein the first loss function expression is: , , , , , Wherein, the Representing the truncated symbol distance loss data; Representing the spatial loss data; color loss data representing the characteristic smoothing loss data; representing a collection of rays having an effective depth measurement; Representing a sampling point set positioned in the cut-off region on the ray r; Representing sampling points D represents the sampling point Tr represents the truncation distance; representing a set of sampling points located in free space on ray r; Representing randomly selected feature grid regions; representing the characteristic difference of the hash grid along the X axis; Representing hash grid edges A characteristic difference value of the shaft; Representing hash grid edges A characteristic difference value of the shaft; Representing sampling points Is used for the hash grid characteristic value; the total loss calculation model expression is: , Wherein, the Representing a first total loss value; Representing a truncated symbol distance loss weight; representing spatial loss weights; representing the characteristic smoothing loss data color loss weights.
7. The object-level semantic SLAM construction method based on NeRF of claim 4, wherein the performing joint coordinate and parameter encoding processing on the image preprocessing data to obtain target scene image data includes: s3121, acquiring world coordinates; s3122, extracting features of the image preprocessing data to obtain scene geometric feature data and appearance color features; The geometric feature data comprises a geometric feature value and a TSDF value; the feature extraction expression is: , , Wherein, the Representing One-blob encoding results; Representing a multi-resolution hash grid feature; representing a geometric decoder; representing the geometric feature value; representing the TSDF value; representing a color decoder; representing an appearance color feature; representing a first learning parameter; representing a second learning parameter; Representing a third learning parameter; S3123, performing volume rendering calculation on the TSDF value and the appearance color feature to obtain rendering color data and rendering depth data; The volume rendering calculation expression is: , , , , Wherein, the Representing the pixel rendering color corresponding to the ray; representing the pixel rendering depth corresponding to the ray; representing the ith sampling point rendering weight; Tr represents the truncation distance; The TSDF value of the ith sampling point is represented, M represents the total sampling point number of rays; representing the color of the i-th sampling point; Representing the depth of the ith sample point; s3124, processing the rendering color data and the rendering depth data based on a second loss function to obtain volume rendering color optimization data and volume rendering depth optimization data; the second loss function expression is: , , Wherein, the Representing a second total loss value; representing color loss weights; representing depth loss weights; representing the color loss data; n represents the total number of tracking sampling pixels; Representing a rendering color of the nth pixel; Representing the true color of the nth pixel; representing the rendering depth of ray r; And S3125, performing fusion processing on the volume rendering color optimization data and the volume rendering depth optimization data to obtain target scene image data.
8. The object-level semantic SLAM construction device based on NeRF is characterized by comprising a data acquisition module, a preprocessing module and an SLAM construction module; the data acquisition module is used for acquiring image data and camera parameter data; The preprocessing module is used for preprocessing the image data and the camera parameter data to obtain image preprocessing data and camera calibration parameter data; The SLAM construction module is used for processing the image preprocessing data and the camera calibration parameter data by utilizing an object-level semantic SLAM modeling model to obtain camera pose data, object scene modeling data and scene object semantic segmentation data; The data acquisition module, the preprocessing module and the SLAM construction module are sequentially connected in data.
9. An object-level semantic SLAM construction apparatus based on NeRF, the apparatus comprising: A memory storing executable program code; A processor coupled to the memory; the processor invokes the executable program code stored in the memory to perform the unmanned aerial vehicle consist self-stabilizing assurance method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions which, when invoked, are adapted to perform the object level semantic SLAM construction method based on NeRF as claimed in any one of claims 1-7.

Description

Object-level semantic SLAM construction method and device based on NeRF Technical Field The invention belongs to the field of computer vision, and particularly relates to a NeRF-based object-level semantic SLAM construction method and device. Background In terms of Object modeling, conventional Object-level Semantic synchronous localization and mapping (Object-level-Semantic-Simultaneous-localization-and-mapping) systems use Computer-Aided-Design (CAD) or 3D geometry modeling methods to reconstruct and model objects in the environment in three dimensions. However, these conventional modeling methods have problems of long modeling time, low modeling accuracy, and the like, and are difficult to meet the requirements of practical applications. In terms of object semantic recognition, conventional object-level semantic synchronous localization and mapping algorithms typically rely on target detection algorithms with a priori information to obtain information about objects in the environment. The main drawback of this approach is that it cannot accurately identify and model new objects that are not contained in the pre-verification knowledge base, lacks the ability to detect and model unknown objects, and has difficulty in coping with challenges of a large number of new, unknown target objects in future complex environments. In order to solve the problems, the object-level semantic SLAM construction method and device based on NeRF are provided, firstly, object modeling is carried out by utilizing NeRF technology, the problem of poor object modeling precision of the traditional object-level semantic SLAM is solved, then, object semantic recognition is carried out by utilizing an image segmentation large model SEEM, and the problem that the traditional object-level semantic SLAM cannot recognize objects beyond prior information is solved. Disclosure of Invention Aiming at the problems that modeling time is long, novel object detection and modeling capability is low, and modeling precision is low, and actual requirements are difficult to meet, the invention provides an object-level semantic SLAM construction method and device based on NeRF, a scene is represented as a multi-resolution hash grid through joint parameter and coordinate coding to realize high-precision scene reconstruction, a global beam Adjustment method (BA) and a minimized loss function are used to realize high-precision positioning, and the scene representation is coded into a joint visual semantic space to generate object semantic information, so that semantic segmentation with zero sample and pixel level separation precision is realized. In order to solve the technical problem, a first aspect of the embodiment of the present invention discloses a NeRF-based object-level semantic SLAM construction method, which includes: s1, acquiring image data and camera parameter data; s2, preprocessing the image data and the camera parameter data to obtain image preprocessing data and camera calibration parameter data; and S3, processing the image preprocessing data and the camera calibration parameter data by using an object level semantic SLAM modeling model to obtain camera pose data, object scene modeling data and scene object semantic segmentation data. As an optional implementation manner, in the first aspect of the embodiment of the present invention, the object-level semantic SLAM modeling model includes: the system comprises a data input module, a tracking modeling module, an object semantic recognition module and a data output module; The data input module is used for acquiring the image data and the camera parameter data; The tracking modeling module is used for carrying out ray sampling and camera tracking modeling on the image data to obtain the target scene image data and the camera tracking data; The object semantic recognition module is used for carrying out coding processing on the target scene image data to obtain object mask semantic data and semantic tag data; the data output module is used for processing the camera tracking data, the object mask semantic data and the semantic tag data to obtain the camera pose data, the object scene modeling data and the scene object semantic segmentation data. In a first aspect of the embodiment of the present invention, the processing the image preprocessing data and the initial camera pose data by using an object level semantic SLAM modeling model to obtain camera pose data, object scene modeling data and scene object semantic segmentation data includes: s31, tracking and mapping the image preprocessing data and the camera calibration parameter data by using an object-level semantic SLAM modeling model to obtain target scene image data and optimized camera pose data; S32, performing global beam adjustment method optimization processing on the target scene image data and the optimized camera pose data by using the object-level semantic SLAM modeling model to obtain joint optimized camera pose data and jo