CN-121582336-B - Robot algorithm, storage medium and equipment oriented to depth constraint under dynamic scene

CN121582336BCN 121582336 BCN121582336 BCN 121582336BCN-121582336-B

Abstract

The invention belongs to the technical field of synchronous positioning and map creation, and particularly discloses a depth constraint robot algorithm for a dynamic scene, which comprises the following steps of S1, collecting image information, and completely dividing a dynamic object by using an image dividing network; and S3, using a time domain bidirectional pixel point cloud background restoration model, utilizing static information in adjacent front and back p frames to finish restoration of a background static key frame of a position where the dynamic object is located, constructing a complete point cloud map and calculating pose. The method solves the technical problems that in the prior art, under a dynamic scene, potential dynamic objects are difficult to accurately divide, limitation and deviation exist in dynamic object judgment, the dynamic objects cannot be accurately removed, and further the pose of the user cannot be accurately estimated.

Inventors

CHEN MENGYUAN
XU LIANG
FANG YUANJIE

Assignees

安徽工程大学

Dates

Publication Date: 20260508
Application Date: 20251113

Claims (8)

1. A robot algorithm oriented to depth constraint under a dynamic scene is characterized by comprising the following steps: step S1, collecting image information, and completely dividing a dynamic object by using an image dividing network; S2, analyzing the motion state of the object by a depth constraint method, and eliminating the precisely identified dynamic region; s3, using a time domain bidirectional pixel point cloud background restoration model, utilizing static information in adjacent front and rear p frames to finish restoration of a background static key frame of a position where a dynamic object is located, constructing a complete point cloud map, and calculating pose; in step S2, using a camera projection transformation mapping operator to obtain pixel points, three-dimensional space point coordinates and depth to obtain the space position of the analysis object after mapping transformation The mapping function is used for calculating the coordinates of the projection image of the current frame, calculating the depth error between a series of continuous frames, constructing a motion score formula to describe the influence of the depth error and epipolar line error, comparing the motion score with the corresponding threshold value of the static point, and identifying the motion state of the space point; The calculation formula of the motion score formula is as follows: , Wherein, the In order to be a motion score, Error of the epipolar line Corresponding to the epipolar line threshold Comparing; Will depth error Corresponding to the depth error threshold Comparing when the epipolar line error Or depth error When the corresponding threshold value is exceeded, the threshold value of the static point is When (when) > When the space point is in a motion state, the space point is indicated to be in a static state; The mapping function is: , wherein, And For the space vector parameters and the direction vector parameters, Is a high-dimensional projective transformation; Represented as an inverse of the high-dimensional projective transformation, For the transformation between two frames, The depth map describes the depth of projection points by taking average depth in a 2x2 grid with projection coordinates as the center; In step S3, a time domain bidirectional pixel point cloud background restoration model is used, a current frame to be restored is taken as a time starting point, a time weighting mode is used, background restoration of a pixel grid model is carried out according to a p frame before the time domain and a p frame after the time domain, a dynamic object area is mapped into the pixel grid, color and depth information of an image are projected onto the current frame according to a characteristic point matching relation, areas shielded by the dynamic object among different frames are matched with different weights, and a restored current frame is obtained The calculation formula is as follows: , Wherein, the Is the current frame after the repair and, Is the current frame that is not yet repaired, Representing the current frame temporally preceding p frames, Representing the p-frame temporally following the current frame, And To correlate weights, the closer to the current frame the higher the image weight.
2. The method for dynamically constrained depth robot algorithm of claim 1, wherein the depth error is The calculation formula of (2) is as follows: , wherein, For solving the function for the mean value based on the corresponding mapping function The coordinates of the projected image of the current frame are calculated, Represented as reference frames A depth map corresponding to the pixel point under the coordinates, And projecting a depth map corresponding to the pixel point under the image coordinates for the current frame.
3. The robot algorithm for depth constraint under a dynamic scene according to claim 1 is characterized in that a residual structure is introduced into a backbone network of an image segmentation network to enhance performance of the backbone network to form a residual-feature pyramid network, feature images with high-dimensional information features in the backbone network are fused into a two-dimensional collaborative attention module, pixel weights of dynamic object areas are self-adaptive, feature responses of the dynamic object areas are enhanced while image feature information is enriched, feature images with low-dimensional information features in the backbone network are fused into a multi-dimensional hierarchical enhancement feature fusion module to perform feature enhancement processing on feature images with low-dimensional information features to strengthen weak semantic information in the feature images, and outputs of the two-dimensional collaborative attention module and the multi-dimensional hierarchical enhancement feature fusion module are combined to generate an image segmentation result.
4. A depth constraint-oriented robot algorithm in a dynamic scene is characterized in that in a multi-scale hierarchical enhancement feature fusion module, a plurality of layers of encoders and decoders with symmetrical structures are used, an image is firstly divided into feature images with various sizes by the encoders, the obtained result is processed by a Block layer, the Block layer is formed by connecting two parts in series, one part is a multi-head space dimension-reduction attention module, the other part is a hybrid feedforward neural network, two branches are designed for each size image, namely one branch outputs to the next layer to perform image fusion for downsampling and reducing resolution, the other branch performs feature fusion for deepening restoration of high-level features and edge information, then feature images generated by two branches of different layers are combined on channel convolution with 1x1 dimension sequentially, the obtained feature images are sampled on a layer-by-layer basis and processed by the decoder, and then the restored image is subjected to linear transformation through linear projection to generate probability outputs of corresponding categories.
5. The robot algorithm for depth constraint in dynamic scene according to claim 4, wherein the algorithm of the multi-head space dimension-reducing attention module is as follows: , The algorithm of the hybrid feedforward neural network is shown as follows: , Wherein, the The normalization function is represented as a function of the normalization, The function represents downsampling, Q, K, V represents the input query, key and value, For the feature dimension of a single attention head, Negative perturbation being a fixed value; The input characteristic diagram is represented by a graph of the input characteristics, The characteristic diagram is subjected to layer normalization processing, the Attention represents the processing of a multi-head space dimension reduction Attention module, Representing two fully connected layers, geLU being the activation function, In order to activate the function, Representing the output result of the Block layer, and outputting the output result to the Block layer The method comprises the steps of performing up-sampling processing, performing fusion processing after overlapping with the characteristic image output by the front layer encoder, and outputting a calculation result to a decoder after the up-sampling processing.
6. A depth constraint-oriented robot algorithm under a dynamic scene is characterized in that after a feature map is input into a two-dimensional collaborative attention module, global attention average pooling and maximum pooling operation are carried out on the feature map, obtained corresponding features are subjected to relation among reinforced channels of a multi-layer perception layer, weights of all channels are redistributed to obtain a channel attention map with weight distribution, the channel attention map is input into a channel maintenance convolution kernel to obtain horizontal features and vertical features of three different scales, spatial fusion features are obtained through element-by-element addition fusion, finally dimensional correlation of the spatial fusion features is adjusted through channel mixing to obtain output, and the output of the two-dimensional collaborative attention module and probability output of the channel attention map and multi-scale layered enhancement feature fusion module are subjected to element-by-element accumulation fusion, and segmentation results of an image segmentation network are output after multi-layer up-sampling.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of a depth constraint oriented robotic algorithm in a dynamic scenario as claimed in any one of claims 1-6.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, characterized in that the processor implements the steps of a depth constraint oriented robotic algorithm in a dynamic scenario as claimed in any one of claims 1-6 when executing the computer program.

Description

Robot algorithm, storage medium and equipment oriented to depth constraint under dynamic scene Technical Field The invention belongs to the technical field of synchronous positioning and map creation (Simultaneous Location AND MAPPING, SLAM), and particularly relates to a depth constraint robot algorithm, a storage medium and equipment for a dynamic scene. Background The synchronous positioning and map construction (Simultaneous Location AND MAPPING, SLAM) technology is a core support technology of autonomous navigation and environment interaction of a robot, and can estimate the pose of the robot and construct a local map under the condition of no environment prior information, so that necessary support is provided for downstream perception and decision. The visual SLAM combines the deep learning, multi-view geometry and point and line feature extraction methods, and realizes environment perception and structure characterization. The visual SLAM can improve the capability of capturing environment semantic data with the assistance of a low-cost camera and has excellent scene adaptability due to the rapid development of computer vision, so that the visual SLAM has wide development potential and important economic application value. However, in actual scene application, in the face of problems of inaccurate segmentation, poor segmentation precision, incapability of segmentation and the like faced by rapid movement of a camera or a dynamic object, a potential dynamic object segmentation link, although the existing segmentation network can accurately segment the object and keep good accuracy, the processing of semantic information of the boundary of the dynamic object is still insufficient, and the segmentation accuracy cannot be ensured. In the dynamic object judging link, the inherent limitation exists only by virtue of a multi-view geometric judging mode, and when a motion vector is collinear with a camera track or axial motion exists, the judgment of a dynamic object key frame is easy to deviate, so that the dynamic object in the key frame is inaccurate to reject. In addition, the prior art has defects in repairing static backgrounds of images of the current frame after removing dynamic areas, the images cannot be projected onto the current frame according to the color and depth information of the images, the sizes of areas shielded by dynamic objects between different frames are different, and the optimal weights cannot be matched, so that the static backgrounds of the positions where the dynamic objects are located are poor in repairing effect, the dense point cloud construction of static environment characteristics is affected, and the pose estimation accuracy is affected. Disclosure of Invention The invention aims to provide a depth constraint robot algorithm oriented to a dynamic scene, which is used for solving the technical problems that in the prior art, potential dynamic objects are difficult to accurately divide under the dynamic scene, limitation and deviation exist in judgment of the dynamic objects, the dynamic objects cannot be accurately removed, and further the pose of the robot cannot be accurately estimated. The robot algorithm for depth constraint under the dynamic scene comprises the following steps. And S1, collecting image information, and completely segmenting the dynamic object by using an image segmentation network. And S2, analyzing the motion state of the object by a depth constraint method, and eliminating the precisely identified dynamic region. And S3, using a time domain bidirectional pixel point cloud background restoration model, utilizing static information in adjacent front and rear p frames to finish restoration of a background static key frame of a position where a dynamic object is located, constructing a complete point cloud map, and calculating the pose. In step S2, using a camera projection transformation mapping operator to obtain pixel points, three-dimensional space point coordinates and depth to obtain the space position of the analysis object after mapping transformationThe method comprises the steps of mapping a function, calculating the coordinates of a projection image of a current frame, calculating the depth errors among a series of continuous frames, constructing a motion score formula to describe the influences of the depth errors and epipolar errors, comparing the motion score with a corresponding threshold value of a static point, identifying the motion state of a space point, and eliminating a precisely identified dynamic region after the dynamic judgment of a feature point is completed. Preferably, the formula of the motion score is as follows: , Wherein, the In order to be a motion score,Error of the epipolar lineCorresponding to the epipolar line thresholdComparing; Will depth error Corresponding to the depth error thresholdComparing when the epipolar line errorOr depth errorWhen the corresponding threshold value is exceeded, the threshold value of