CN-121982235-A - Offline data set training three-dimensional Gaussian splatter SLAM method and system

CN121982235ACN 121982235 ACN121982235 ACN 121982235ACN-121982235-A

Abstract

The application provides a three-dimensional Gaussian splatter SLAM method and system for training an offline data set, wherein the method comprises the following steps of S1, acquiring image and pose information through a REALSENSE binocular depth camera, classifying, naming and storing the acquired information, outputting the information into an offline RGB-D data set with image information, pose information and camera internal parameters, S2, extracting data from the offline data set during training, and converting the data to obtain an image tensor, a depth map matrix and a pose tensor. And S3, representing the scene by using three-dimensional Gaussian distribution, and rendering by micro-rasterization, so that the scene acquired offline is reproduced in the instant localization and mapping (SLAM). According to the method for extracting the offline data set through REALSENSE before training the three-dimensional Gaussian splatter SLAM, the three-dimensional Gaussian splatter SLAM training is realized when the camera sensor cannot be directly connected with the training server.

Inventors

ZHANG LIWEI
XIAO YANFENG

Assignees

福州大学

Dates

Publication Date: 20260505
Application Date: 20260121

Claims (10)

1. An offline data set training three-dimensional gaussian splatter SLAM method, comprising the steps of: s1, constructing an offline RGB-D data set, wherein the offline RGB-D data set comprises multi-frame RGB images and depth maps which are continuously acquired by a camera, camera pose information corresponding to each frame and camera internal references; S2, during training, extracting data from an offline data set, and obtaining an image tensor, a depth map matrix and a pose tensor after conversion; and S3, representing the scene by using three-dimensional Gaussian distribution, and rendering by micro-rasterization, so that the scene acquired offline is reproduced in the instant positioning and map construction.
2. The method for training the three-dimensional Gaussian splatter SLAM of the offline data set according to claim 1, wherein the offline RGB-D data set is constructed as follows: Respectively acquiring a color image frame, a depth image frame aligned with the color image frame and synchronous inertial sensor data by utilizing REALSENSE depth cameras, wherein the inertial sensor comprises an accelerometer and a gyroscope; Time integration is carried out on the angular velocity of the gyroscope to obtain an Euler angle, and a rotation matrix of the current frame relative to the reference frame is calculated based on the Euler angle; performing time integration on acceleration data output by an accelerometer to obtain a translation vector of a current frame relative to a reference frame, and constructing a4 multiplied by 4 pose matrix of the current frame by combining a rotation matrix; And respectively storing the color image frames, the depth image frames, the visual depth images and the pose matrixes into different catalogues according to the time stamp, and carrying out renaming operation according to a unified naming rule to form a structured offline RGB-D data set.
3. The method for training three-dimensional Gaussian splatter SLAM for offline data set according to claim 2, wherein the pose matrix is constructed by the following steps: Angular velocity of gyroscope Multiplying by frame time interval Obtaining the attitude change amount : Angular velocity of gyroscope , Respectively the angular velocity components of the gyroscope in the X, Y, Z-axis direction, Respectively representing the change amount of the rotation angle of each axis of the gyroscope; Using the amount of change in the angle of rotation of the gyroscope triaxial Construction of rotation matrices around X, Y, Z axes, respectively And sequentially multiplying to obtain a rotation total matrix : ; Acceleration vector to be obtained by accelerometer Multiplying by the square of the time interval Obtaining translation increment ; Pose matrix using previous frame Left-hand rotation matrix And accumulate the translation increment Obtaining the pose matrix of the current frame : 。
4. The method for training three-dimensional Gaussian splatter SLAM for offline data set according to claim 2, wherein said renaming operation is performed according to a unified naming rule, specifically as follows: setting a starting number, and renaming each frame file according to the sequence of the designated number; In the renaming process, synchronously renaming each frame of color image file, depth image file and visual depth image file according to frame numbers, and simultaneously modifying the number field of the corresponding frame in the pose data file to enable the pose information arranged according to the line sequence in the pose data file to correspond to the renamed image file frame sequence one by one; The appointed numbering sequence comprises two modes according to the positive sequence and the reverse sequence of the frame numbers so as to adapt to time sequence adjustment under different training requirements.
5. The method for training three-dimensional Gaussian splatter SLAM of claim 1, wherein the step of extracting data from the offline dataset comprises the steps of: And reading the color image, the depth map, the camera internal parameters and the corresponding pose information from the offline data set, constructing the camera internal parameters including focal length, principal point position, image size and depth scale factors, and packaging the camera internal parameters into a camera internal parameter structure body for subsequent use.
6. The method for training the three-dimensional Gaussian splatter SLAM of claim 5, wherein the method comprises the steps of converting the offline data set into an image tensor, a depth map matrix and a pose tensor, and is characterized by comprising the following steps: Loading color image data from the image catalog and converting the color image data into a tensor format through preprocessing, wherein the method comprises the steps of correcting an image channel sequence into RGB, normalizing the RGB to a value interval of [0,1], converting the RGB to the tensor format, and adjusting the channel dimension sequence to be [ C, H, W ] to adapt to subsequent calculation, wherein C represents a channel, H represents a height, and W represents a width; Loading a depth map from a depth map catalog, carrying out unit conversion according to a preset depth scale factor, and carrying out numerical replacement on invalid or abnormal values; Reading a camera pose 4 multiplied by 4 transformation matrix of a corresponding frame from a pose data file according to a frame index, and converting the camera pose 4 multiplied by 4 transformation matrix into a tensor format so as to represent the position and orientation of a camera under a world coordinate system for a subsequent calculation map; Extracting camera internal parameters to construct a camera internal parameter matrix And calculates the horizontal angle of view from the camera internal parameters Angle of view perpendicular to For use in downstream three-dimensional point cloud back projection or rasterized rendering modules.
7. The method for training three-dimensional Gaussian splatter SLAM for offline data set according to claim 1, wherein the three-dimensional Gaussian distribution representation is constructed as follows: performing exposure correction based on the preprocessed tensor image, restoring the reflectivity image under real illumination through exposure parameters, and converting the reflectivity image back to an 8-bit RGB format [ H, W, C ]; initializing Gaussian kernel parameters based on the simplified point cloud; And packaging all Gaussian kernel parameters into PyTorch objects, recording frame sources and fusion states, and merging the frame sources and the fusion states into a global Gaussian point cloud map for subsequent training, rendering or SLAM pose optimization.
8. The method for training the three-dimensional Gaussian splatter SLAM through the offline data set according to claim 7, wherein the parameter initialization of the three-dimensional Gaussian points and the global Gaussian point cloud map construction are specifically as follows: initializing Gaussian kernel parameters based on the reduced point cloud, including: Converting the point cloud coordinates into trainable parameter tensors; Converting RGB colors into spherical harmonic coefficient representations and expanding the spherical harmonic coefficient representations to Gao Jieqiu harmonics; calculating an initial scale based on the Euclidean distance between the point cloud and the camera; rotation, namely initializing into unit quaternion; the opacity is initialized to a preset value and is converted into a learnable parameter through sigmoid inverse transformation; The initialized three-dimensional Gaussian points are inserted into an SLAM map data structure, and the association information of the three-dimensional Gaussian points and key frames is recorded, wherein the association information comprises key frame indexes, the number of observation frames and the identification of whether the observation frames are initial frames or not, and the three-dimensional Gaussian points are used for Gaussian point management and updating in the subsequent micro-rendering and optimizing stage.
9. The method for training the three-dimensional Gaussian splatter SLAM through the offline data set according to claim 1, wherein the method is characterized by comprising the following steps of: Back projecting the effective point cloud in each frame of image into three-dimensional points according to the depth values, and generating corresponding three-dimensional Gaussian distribution; Setting observation view angle parameters including an internal camera reference matrix, a view transformation matrix, an image size and a view angle, calculating a projection scaling factor, and projecting a three-dimensional Gaussian point into a screen space to obtain two-dimensional screen coordinates; Obtaining an anisotropic covariance matrix according to the scale and rotation parameters of the three-dimensional Gaussian, and obtaining a two-dimensional covariance matrix through the Jacobian matrix calculation of a projection function; Based on the color information of Gaussian points, generating the final color of each pixel through multi-Gaussian weight superposition; projecting the Gaussian point cloud to the two-dimensional image through a rasterizer, and outputting a corresponding depth map, an opacity map, a visibility mask and screen space coordinates; rasterizer parameters are set based on camera parameters including projection matrix, view matrix, camera position, background color, opacity, scaling factor.
10. An offline data set training three-dimensional gaussian splatter SLAM system comprising a processor, a memory and a computer program stored on said memory, said processor, when executing said computer program, performing in particular the steps of the offline data set training three-dimensional gaussian splatter SLAM method of any of claims 1-9.

Description

Offline data set training three-dimensional Gaussian splatter SLAM method and system Technical Field The invention belongs to the crossing field of computer vision, robot positioning mapping and machine learning, and particularly relates to a three-dimensional Gaussian splatter SLAM (selective liquid level) method and system for offline data set training. Background The instant localization and mapping (Simultaneous Localization AND MAPPING, SLAM) technology is widely applied to the fields of automatic driving, robot navigation, virtual reality, etc. In recent years, with the development of neural representation methods, SLAM systems based on three-dimensional gaussian splatter (3D Gaussian Splatting) have become a research hotspot. According to the method, the three-dimensional scene points are modeled by Gaussian distribution, and micro-rasterization rendering is matched, so that high-quality reconstruction of scene geometry and textures is realized, and the method is excellent in reconstruction precision and rendering efficiency. Existing three-dimensional gaussian splatter SLAM methods typically rely on online data acquisition and training processes, i.e., the camera sensor needs to be connected in real-time to a training server (e.g., GPU workstation) to continuously transmit image, depth information, and pose data to the training module. The online mode of collecting and training is suitable for small-scale experimental scenes, but has a plurality of limitations in practical application. For example, when the camera is installed in a mobile device, a field platform, or a network-connection-free environment, the data cannot be uploaded to the training server in real time, resulting in interruption or inextensibility of the SLAM training process. In addition, there is currently a lack of systematic offline data set training schemes to support high quality training by uploading data in batches to a server after collection is complete. This ability to decouple data acquisition from training processes is particularly important for resource-constrained or network-constrained scenarios. Meanwhile, the traditional SLAM method is difficult to effectively manage RGB images, depth images, IMU data and pose information in an offline mode, lacks a unified data structure and an automatic loading mechanism, and is not beneficial to subsequent Gao Sijian images and optimization processing. Therefore, a three-dimensional gaussian splatter SLAM method supporting separation of offline data acquisition and training is needed, and three-dimensional map construction and pose estimation can be completed through a standardized offline RGB-D data set under the condition that a sensor cannot be connected with a training server in real time. The method not only improves the adaptability and flexibility of the system, but also provides reliable training support for intelligent systems deployed in complex environments. The invention aims to solve the problems in the prior art and designs a method and a system for training three-dimensional Gaussian splatter SLAM by using an offline data set. Disclosure of Invention The invention aims to provide a method and a system for training three-dimensional Gaussian splatter SLAM (sequential sloam) by an offline data set, wherein the method is used for extracting the offline data set through REALSENSE before training the three-dimensional Gaussian splatter SLAM, so that the three-dimensional Gaussian splatter SLAM training is realized when a camera sensor cannot directly connect with a training server. In order to achieve the above purpose, the technical scheme of the invention is as follows: an offline data set training three-dimensional gaussian splatter SLAM method comprising the steps of: s1, constructing an offline RGB-D data set, wherein the offline RGB-D data set comprises multi-frame RGB images and depth maps which are continuously acquired by a camera, camera pose information corresponding to each frame and camera internal references; S2, during training, extracting data from an offline data set, and obtaining an image tensor, a depth map matrix and a pose tensor after conversion; and S3, representing the scene by using three-dimensional Gaussian distribution, and rendering by micro-rasterization, so that the scene acquired offline is reproduced in the instant positioning and map construction. Preferably, the offline RGB-D dataset is constructed as follows: Respectively acquiring a color image frame, a depth image frame aligned with the color image frame and synchronous inertial sensor data by utilizing REALSENSE depth cameras, wherein the inertial sensor comprises an accelerometer and a gyroscope; Time integration is carried out on the angular velocity of the gyroscope to obtain an Euler angle, and a rotation matrix of the current frame relative to the reference frame is calculated based on the Euler angle; performing time integration on acceleration data output by an acceleromete