CN-116772844-B - Navigation method of visual inertial indoor robot based on dynamic environment

CN116772844BCN 116772844 BCN116772844 BCN 116772844BCN-116772844-B

Abstract

The invention discloses a navigation method of a visual inertial indoor robot based on a dynamic environment, which comprises the steps of acquiring robot image information by a depth camera, carrying out pre-integration processing on an inertial detection unit, detecting characteristic points of images and carrying out optical flow tracking, generating mask images for potential dynamic target detection of the images, predicting the mask images, compensating for failure in segmentation and slow segmentation operation, setting mask depths, filtering dynamic characteristic points, recovering information obtained by pre-integration of the inertial detection unit, correcting gyroscope errors, estimation scales and the like, carrying out nonlinear optimization on pre-integration results of the inertial detection unit and the characteristic point information, carrying out pre-integration on pose of predicted image frames by the inertial detection unit, triangulating the characteristic points and reconstructing, carrying out point cloud alignment on image frames of key frames and depth information, carrying out point cloud splicing by an iterative nearest point algorithm, and realizing an environment map with navigation information. The invention has high positioning and mapping precision in a non-static environment and good path planning robustness.

Inventors

WANG FANG
ZHAO JIANCHENG
HUANG SHUCHENG

Assignees

江苏科技大学

Dates

Publication Date: 20260512
Application Date: 20230518

Claims (6)

1. The navigation method of the visual inertial indoor robot based on the dynamic environment is characterized by comprising the following steps of: Step 1, acquiring image information of an indoor robot through a depth camera with an inertial unit, which is mounted on the indoor robot, and finishing pre-integration processing of the inertial detection unit, feature point detection of an image and optical flow tracking of the feature point; step 2, the semantic segmentation network completes detection of potential dynamic targets of the images to generate mask images, the optical flow estimation network predicts the mask images to realize compensation of segmentation failure and slow segmentation operation of the semantic segmentation network, sets mask depth on the mask images and filters dynamic feature points; step 3, information is obtained through visual motion structure recovery and inertial detection unit pre-integration, so that correction of gyroscope errors, estimated scales, acceleration bias and gravitational acceleration is realized; Step 4, inputting the pre-integral result of the inertial detection unit and the characteristic point information which are output after the front end processing, and performing nonlinear optimization; And 5, performing time alignment on the key frames and the image frames containing the depth information, directly establishing point cloud through the image frames containing the depth information, realizing point cloud map splicing through an iterative nearest point algorithm, and finally realizing an environment map with navigation information through adopting an octree data structure.
2. The method for navigating a vision inertial indoor robot based on a dynamic environment according to claim 1, wherein the step 1 comprises the following steps: 1.1 depth camera with inertial unit mounted by indoor robot at 15Hz frequency acquisition 848 480 Resolution image information; 1.2, acquiring data of an inertial detection unit by the depth camera at a frequency of 250 Hz; 1.3 calculating a measurement value of an inertial detection unit, wherein the inertial detection unit comprises a triaxial accelerometer and a triaxial gyroscope, and a model of the inertial detection unit is expressed as follows under the consideration of white noise and random walk noise: (1); In the parameters of Representing the real angular velocity measured by the inertial detection unit, parameters Indicating acceleration measured by inertial detection unit Indicating gyroscope, superscript Indicating accelerometer, superscript Expressed in world coordinate system Representing the coordinate system of the inertial detection unit, parameters Representing bias, parameters Representing gravitational acceleration; represented as white noise; representing the rotation amount transformed from the world coordinate system to the inertial detection unit coordinate system; Representing a gyroscope angular velocity bias; Representing acceleration bias; 1.4 establishing a differential equation of a motion model of the inertia detection unit, wherein the differential equation of the motion model of the inertia detection unit can be expressed as follows according to a Newton's second motion law and a quaternion kinematics model: (2); In the formula, Representing time; Representation of An inertial detection unit coordinate system at a moment; representing the derivative of the translation amount of the inertial detection unit coordinate system transformed to the world coordinate system with respect to time; representing the derivative of the rotation amount of the inertial detection unit coordinate system transformed to the world coordinate system with respect to time; Representation of Speed at time relative to time coordinate system; Representation of An accelerometer at time relative to a time coordinate system; Is shown in Derivative of speed versus world at time relative to time coordinate system; representing the speed of the inertial detection unit coordinate system transformed into the world coordinate system; Representation of Angular velocity of an inertial detection unit coordinate system at a moment; 1.5 building a pre-integral model, deriving and rejecting quantities related to the world coordinate system by the formula (2) so that integral terms are only related to the last moment, and then the pre-integral model can be expressed as: (3); In the formula, Representing a time interval; Representation of An inertial detection unit coordinate system at a moment; Representation of From moment to moment The moment inertia detection unit detects the rotation amount of the coordinate system; Representation of Translation amount from the world coordinate system to the inertial detection unit coordinate system at the moment; a rotation matrix representing the world coordinate system to the inertial detection unit coordinate system; In the pair (3) Two sides are simultaneously multiplied The method can obtain: (4); the pre-integral term of the inertial detection unit is expressed as: (5); 1.6 The method comprises the steps of detecting characteristic points based on an acceleration segmentation test, calculating the absolute difference between the circle center and the circumference, namely taking a pixel point on an image, taking the pixel point as the circle center as the circumference with a radius of 3 pixel value units, setting 16 pixel points on the circumference, and setting the brightness of the pixel point at the circle center as the brightness Calculating the difference value between the circle center and 16 pixel points on the circle, if 9 points are continuously arranged on the circle, the absolute difference value between the circle center pixel points is larger than the threshold value Taking the center point pixel as a characteristic point; 1.7 The method comprises the following steps of feature point tracking, feature point tracking and feature point tracking in different images through a Ruchs-gold light flow, wherein the gray scale of the same pixel point in different images is consistent in an optical flow method, and the mathematical expression is as follows: (6); After the Taylor approximation and the formula reduction are carried out, the method comprises the following steps: (7); In the formula, Representing the gradient of the pixel point in the directions of the x axis and the y axis and is recorded as ; Representing the translational velocity of the pixel point in the x and y axes, and is recorded as In the same way as the above, the following steps, Representing the partial derivative of gray value with respect to time, which is recorded as Then formula (7) can be expressed as: (8); Based on And then expressed as: (9); In the formula, Representing the in-window first Individual pixel points The time-of-day gray value is biased against time, Representing the in-window first Gradient of each pixel point in the directions of the x axis and the y axis; and (3) establishing a corresponding least square method for solving the equation of the formula (9) through the Ceres library to solve the pixel point coordinates.
3. The method for navigating the vision inertial indoor robot based on the dynamic environment according to claim 1, wherein the step 2 comprises the following steps: Training a semantic segmentation network Mask R-CNN on COCO data sets marked with different objects and different colors to obtain semantic segmentation weights, training an optical flow estimation network PWC-Net on Sintel data sets to obtain optical flow weights; 2.2, carrying out semantic segmentation on the image frames through a semantic segmentation network Mask-RCNN to obtain semantic Mask images; 2.3 obtaining the optical flow estimation information of each frame of image according to the optical flow estimation network PWC-Net, then the coordinates of the pixel points in the current image frame ; Can be expressed by optical flow estimation as: (10); In the formula, For the displacement of pixels in the x-axis from the previous image frame to the current image frame, For the displacement of pixels in the y-axis from the previous image frame to the current image frame, The coordinates of the pixel points in the previous frame of image; 2.4 setting mask depth Using semantic mask image, assume Is the first Mask depth of frame mask image, wherein Mask depth for each pixel, then The values of (2) are as follows: (11); In the formula, Is a static portion of the mask map, As a dynamic part of the mask map, The maximum depth value of the camera; and then depth information corresponding to each detected characteristic point Depth value corresponding to pixel on semantic mask And judging and filtering according to the category, wherein the judgment is as follows: (12); in the formula, 1 represents a dynamic feature point, and 0 represents a static feature point.
4. The method for navigating a vision inertial indoor robot based on dynamic environment according to claim 1, wherein the step 3 comprises the following steps: 3.1 converting the image pose into an inertial detection unit coordinate system through a pose matrix ) The conversion is carried out, and the formula is as follows: (13); In the formula, Representing a first camera coordinate system derived from the image frames, namely a visual motion structure restoration reference coordinate system; Representation of A coordinate system of the camera at the moment; Representation of A coordinate system of the inertia detection unit at the moment; representing the rotation amount of the camera coordinate system converted to the inertial detection unit coordinate system; Representing the translation amount of the camera coordinate system converted into the inertial detection unit coordinate system; representing scale factors; 3.2, correcting the bias of the gyroscope, namely, calculating the bias of the gyroscope according to an objective function by pre-integrating pose and speed information corresponding to the image frames in the sliding window through an inertia detection unit, wherein the objective function is as follows: (14); In the formula, Representing a collection of image frames; an error indicating an amount of change in the angular velocity bias; Is a pre-integral true value; Is the relative rotation amount For a pair of First order jacobian derivative of (2); 3.3 speed, scale and gravity vector of the coordinate system of the first frame image of the camera, the optimization variables are: (15); In the formula, Represent the first The speed of frame picture acquisition; representing the gravity under the first frame image coordinate system; the pre-integral among the coordinate system is expressed as: (16); In the formula (16), the amino acid sequence of the formula (16), Is the first Frame inertia detection unit coordinate system to Rotation matrix of coordinate system Bringing visual movement structure recovered And And switch to In the coordinate system, expressed as: (17); The expression (17) is changed to hx=b, expressed as: (18); In the formula, The method is obtained by initializing a sliding window, a constraint linear equation is established for a formula (18), and the speed, the gravity and the scale factors are obtained by solving a least square problem through Cholesky decomposition, wherein the formula is expressed as follows: (19)。
5. The method for navigating the vision inertial indoor robot based on the dynamic environment according to claim 1, wherein the step 4 comprises the following steps: 4.1 constructing a measurement equation, wherein in a sliding window, the state is defined as: (20); Wherein, the Representing the state quantity of the band estimation in the sliding window; Representing the total number of key frames; Indicating the first in the sliding window Inertial detection unit state corresponding to the frame; Representing the relative pose between the inertial detection unit and the vision; representing the inverse depth of the feature points; Representing the number of feature points; 4.2 constructing a residual equation, wherein the residual equation comprises an inertial detection unit residual item, a visual residual item and an marginalized residual item; the inertial detection unit residual term is defined as: (21); In the formula, Representing the pre-integral value of an inertia detection unit containing zero offset of an accelerometer and a gyroscope in two continuous frames in a sliding window; An error indicating the amount of change in position; An error indicating the amount of change in speed; Error representing angle variation, and vision residual projection equation: (22); in the formula (22), the amino acid sequence of the compound, Representing observed feature point coordinates; an image representing an observation of an i-th frame; indicating how often it was observed within the sliding window when 1 Represents a left-eye image, and 2 represents a right-eye image; a projection matrix is used as an internal reference of the camera; For a pair of Normalization, expressed as: (23); In the formula, Representing coordinates in the camera coordinate system, the formula (23) is taken to the formula (19): (19); the residual term is expressed as: (24); the marginalized residual term formula is defined as: (25); In the formula (25), the amino acid sequence of the amino acid, Representing priori information, namely residual information after the sliding window is marginalized; Is the error of the pre-integral term, i.e. in formula (21) ; 4.3, Nonlinear optimization, namely solving a least square problem by a Ceres tool, and adopting the sum of residual terms to be minimized to realize the optimization of state quantity in a sliding window term, wherein the total cost function is expressed as: (26); In the formula (26), the amino acid sequence of the compound, A second-order sea matrix of the state quantity for the marginalized prior information; Is a robust kernel function; Is an marginalized residual; Pre-integration residual errors between two adjacent frames of the sliding window; residual errors for visual re-projection; For the continuously tracked image feature points, Representing an image frame.
6. The method for navigating a vision inertial indoor robot based on dynamic environment according to claim 1, wherein the step 5 comprises the following steps: 5.1, performing time alignment on the key frame after three-dimensional reconstruction and the depth image frame directly obtained by the depth camera, and ensuring the relative position of the moving track and the constructed map; 5.2, dividing a dynamic target in the depth map into non-configurable areas by using the depth of a mask, and dividing the rest part into configurable areas, wherein the point cloud is defined as follows: (27); In the formula, Representing a target point cloud set; representing a source point cloud; the point cloud registration problem is converted into the target minimization problem, and the point cloud registration problem is found in the source point cloud set Finding the closest point in the target point set The formula is: (28); Solving equation (28) requires calculating a rotational translation matrix for equation (27) For a pair of Performing rotation translation transformation to obtain a corresponding point set in the target point set Then, calculate by distance formula , The distance between the two parts, wherein the distance formula is as follows: (29); finally, by continuous iteration , The distance between them is such that Less than a threshold value Or stopping when the maximum times are reached, otherwise, continuing to calculate; 5.3. The octree structure map construction comprises representing the occupation condition of the grid map by voxels, representing each voxel by eight smaller voxels and expanding downwards continuously, representing by cutting point cloud, cutting again when the number is larger than threshold value, namely the minimum resolution of the voxels, until the number reaches the threshold value, assuming that each node adopts probability to represent the current node to be occupied or unoccupied, and setting the value range to be [0,1] As a value of the probability logarithm, Probability of [0,1], then node occupancy probability is: (30) The octree map can be updated by solving the probability logarithmic value.

Description

Navigation method of visual inertial indoor robot based on dynamic environment Technical Field The invention belongs to the technical field of computer vision, communication and simultaneous positioning and map construction, and particularly relates to a navigation method of a vision inertial indoor robot based on a dynamic environment. Background The mapping of current SLAMs is mostly dependent on static environment, but non-static scenes are often encountered in real scenes. This presents a great challenge for navigation of the robot. With continued research, researchers began to introduce methods of deep learning and geometry. For example, zhong Yuangong et al, on the computer engineering and design, volume 43, phase 3, page 11, propose a solution based on epipolar geometry constraints that uses the characteristic that dynamic feature points move in the polar plane, resulting in the projection points being distant from the epipolar line for filtering. However, this method is less accurate and can only be applied in low dynamic environments, which will fail when there are a large number of dynamic targets in the environment. Patent application CN112308921A discloses a method for eliminating dynamic characteristic points by combining semantic information and epipolar geometry, utilizing a semantic segmentation network to eliminate characteristic points of potential dynamic objects in a high dynamic environment, and eliminating dynamic characteristic points in a low dynamic environment according to geometric constraints. The method can not meet the real-time performance under the high dynamic environment due to the problems of network segmentation failure and slow segmentation operation. Disclosure of Invention The invention aims to overcome the defects of the prior art, and provides a navigation method of a visual inertial indoor robot based on a dynamic environment, so as to solve the problems of poor positioning and mapping precision and poor path planning robustness in a non-static environment. In order to solve the technical problems, the invention adopts the following technical scheme. A navigation method of a visual inertial indoor robot based on a dynamic environment comprises the following steps: Step 1, acquiring image information of an indoor robot through a depth camera with an inertial unit, which is mounted on the indoor robot, and finishing pre-integration processing of the inertial detection unit, feature point detection of an image and optical flow tracking of the feature point; step 2, the semantic segmentation network completes detection of potential dynamic targets of the images to generate mask images, the optical flow estimation network predicts the mask images to realize compensation of segmentation failure and slow segmentation operation of the semantic segmentation network, sets mask depth on the mask images and filters dynamic feature points; step 3, information is obtained through visual motion structure recovery and inertial detection unit pre-integration, so that correction of gyroscope errors, estimated scales, acceleration bias and gravitational acceleration is realized; Step 4, inputting the pre-integral result of the inertial detection unit and the characteristic point information which are output after the front end processing, and performing nonlinear optimization; And 5, performing time alignment on the key frames and the image frames containing the depth information, directly establishing point cloud through the image frames containing the depth information, realizing point cloud map splicing through an iterative nearest point algorithm, and finally realizing an environment map with navigation information through adopting an octree data structure. Specifically, the step 1 comprises the following steps: 1.1 acquiring 848×480 resolution image information at a frequency of 15Hz by a depth camera with an inertial unit mounted on an indoor robot; 1.2, acquiring data of an inertial detection unit by the depth camera at a frequency of 250 Hz; 1.3 calculating a measurement value of an inertial detection unit, wherein the inertial detection unit comprises a triaxial accelerometer and a triaxial gyroscope, and a model of the inertial detection unit is expressed as follows under the consideration of white noise and random walk noise: In the formula, a subscript omega represents the real angular velocity measured by the inertial detection unit, a subscript a represents the acceleration measured by the inertial detection unit, an upper subscript omega represents the gyroscope, an upper subscript a represents the accelerometer, an upper subscript w represents the world coordinate system, an upper subscript b represents the coordinate system of the inertial detection unit, a subscript b represents the bias, a subscript g represents the gravitational acceleration, and n represents white noise; Representing the rotation amount transformed from the world coordinate system to the inertial