CN-121982679-A - Intelligent active sensing and behavior interaction method under unknown shielding scene

CN121982679ACN 121982679 ACN121982679 ACN 121982679ACN-121982679-A

Abstract

The invention belongs to the technical field of autonomous operation and reinforcement learning of robots, and discloses an intelligent active sensing and behavior interaction method under an unknown shielding scene. According to the method, the viewpoint planning strategy of the active perception agent and the pushing strategy of the behavior interaction agent are tightly fused, and the problems of insufficient perception information, low action efficiency and unstable strategy training in an unknown shielding environment are solved by combining a 2.5D occupation height map, a pushing point screening mechanism fusing geometric features and task heuristic weights and a dense rewarding function. The method is remarkably superior to the traditional method in terms of the pushing success rate, the environmental entropy reduction efficiency and the strategy generalization capability, can realize efficient environmental exploration by only minimally invasive pushing of scenes, lays a good foundation for downstream grabbing, cleaning and other tasks, and is suitable for autonomous operation of robots of the messy shielding scenes such as bookshelf, storage cabinet and the like.

Inventors

JIA TONG
GUAN HAOTIAN
ZHOU FEI
LIU YIZHE
MA BOWEN
CHEN HAITAO
YU TIANSHUI

Assignees

东北大学
联通(辽宁)产业互联网有限公司

Dates

Publication Date: 20260505
Application Date: 20260205

Claims (7)

1. The intelligent active perception and behavior interaction method under an unknown shielding scene is characterized in that a closed loop reinforcement learning architecture is constructed, a 2.5D occupation height diagram, a pushing point screening mechanism integrating geometric features and task heuristic weights and a dense rewarding function are combined, the unknown shielding environment is efficiently explored in a minimally invasive way, and the unlocking and structure optimization of the scene is realized, and the method comprises the following steps: step 1, firstly, a robot carries an RGB-D camera to reach a plurality of preset viewpoints to collect scene data, and an initial 2.5D occupation height map is obtained and used as core data for learning a viewpoint planning strategy of an active perception agent and a pushing strategy of a behavior interaction agent; Step 2, training a viewpoint planning strategy for actively sensing the intelligent agent through a reinforcement learning TQC algorithm based on the initial 2.5D occupancy height map constructed in the step 1, wherein the viewpoint planning strategy generates a next viewpoint position in the space, and the robot adjusts the observation viewpoint to obtain new scene data, and continuously and incrementally updates the new scene data into the 2.5D occupancy height map until the maximum visible range is obtained; Step 3, the 2.5D occupancy height map constructed by the active perception intelligent agent in the step 2 is further screened by adopting a pushing point screening mechanism integrating geometric features and task heuristic weights, a pushing direction which can effectively act on an object and does not cause instability of the object is determined, and a batch of candidate pushing schemes are finally obtained; Step 4, the robot learns a pushing strategy of the action interaction intelligent agent through a reinforcement learning TQC algorithm, the strategy carries out non-grabbing pushing operation from one optimal pushing point selected by all candidate pushing schemes in the step 3, and pushes a target object to move to unlock a shielding area, so that the minimum invasive actions are used for exploring as many areas as possible; And step 5, starting the active sensing agent again to obtain the scene change before and after pushing, judging whether the current environment meets the task termination condition, if not, returning to the step2, restarting the viewpoint planning strategy, entering the next round of circulation, and if so, ending the current task and outputting the environment exploration result.
2. The intelligent active sensing and pushing combined method under an unknown occlusion scene according to claim 1, wherein the 2.5D occupancy height map is established by collecting a scene depth image and a color image with a current view angle through an RGB-D depth camera carried by a robot, converting pixel coordinates (u, v) of the depth image into three-dimensional camera coordinates (Xc, yc, zc) through a camera internal reference matrix, converting into three-dimensional coordinates (Xw, yw, zw) under a world coordinate system through coordinate conversion, generating scene point cloud and cutting according to the set boundary point cloud; dividing an Xw-Yw plane under a world coordinate system into two-dimensional pixel grids with fixed resolution, wherein each two-dimensional pixel grid corresponds to a plane coordinate (x, y), counting the number of three-dimensional point clouds in each two-dimensional pixel grid to obtain height occupied data, judging that the two-dimensional pixel grid is occupied by an object if the number of the three-dimensional point clouds is more than or equal to a preset threshold value and the height value h is more than or equal to a height threshold value tau h, assigning an occupied probability p=0.9, judging that no object exists if the number of the three-dimensional point clouds=0 or h < tau h, assigning p=0.1 if the number of the point clouds is between 0 and the preset threshold value, and assigning p=0.5; and (3) multi-view fusion updating, namely acquiring height occupation data under a new view after the robot adjusts the view, updating the height value h of the same pixel grid by adopting weighted average, updating the occupation probability p by adopting a Bayesian fusion rule, and finally forming a 2.5D occupation height map with dynamic updating and multi-view fusion.
3. The intelligent active sensing and pushing combination method in an unknown occlusion scene of claim 2, wherein τh = 0.07m.
4. The intelligent active perception and behavior interaction method under an unknown occlusion scene according to claim 1, wherein the pushing point screening mechanism for fusing geometric features and task heuristic weights specifically comprises the steps of carrying out raster ray projection based on a 2.5D occupancy height map, obtaining all points with visual contact as initial candidate pushing points, and carrying out scoring sorting screening on the pushing points according to geometric feature dimensions and task heuristic weight dimensions.
5. The intelligent active perception and behavior interaction method under an unknown occlusion scene according to claim 4 is characterized in that the geometrical feature dimension is that the height feature and the volume feature of an object are extracted based on a 2.5D occupied height map, the height feature is a height value h corresponding to each candidate pushing point, effective targets with h equal to or larger than τh are screened, and the volume feature is that the volume V of the object to which the candidate point belongs is calculated through the number of connected pixel grids multiplied by the grid area multiplied by the average height; The task heuristic weight dimension is obtained by combining the position feature of an object in the 2.5D occupancy height map and the environment exploration requirement, wherein the position feature is the distance between the plane coordinates (x, y) corresponding to the candidate points and the camera, and the environment exploration requirement is the distance between the candidate points and the unknown region.
6. The intelligent active perception and behavior interaction method under an unknown occlusion scene according to claim 1, wherein an initial 2.5D occupancy height map is taken as a data base, and a viewpoint planning strategy of an active perception agent and a pushing strategy of a behavior interaction agent are respectively trained through a reinforcement learning TQC algorithm.
7. The intelligent active sensing and behavior interaction method under an unknown occlusion scene of claim 6, wherein a reward function, an action space and an observation space in a reinforcement learning TQC algorithm are designed; Designing a reward function: The view planning strategy part rewards consist of two parts: ; ; Wherein the method comprises the steps of For the motion cost of the front and rear view changes, The percent entropy reduction in the height map is occupied by 2.5D for the two front and rear viewpoint changes; The promotion policy part rewards consist of four parts: when the entropy reduction exceeds the threshold, the push is considered to obtain significant perceived gain, the current push action is recorded to be successful, and the following rewards are given: ; where δ=15 is the rewarding gain obtained by the first significant entropy reduction to encourage the strategy to explore the information-dense area actively; introducing two adjacent entropy reduction variable quantities: ; If it is Less than or equal to 10 percent, which indicates insufficient behavior improvement amplitude, giving punishment: ; gain for information <5% Or push operation causing collision, apply penalty: ; taking the execution cost into consideration, introducing a comprehensive operation cost term of pushing times k and displacement distance d: ; the complete bonus function combination is as follows: ; the motion space is designed, wherein the motion space range of the viewpoint planning strategy is five-dimensional continuous space in the 6D Cartesian pose of a camera on an end effector and does not comprise a roll angle; The method comprises the steps of designing an observation space, defining a 43-dimensional continuous vector space in a viewpoint planning strategy stage, using a variational automatic encoder to encode a 2.5D occupied height map into a potential space with the size of 32 dimensions, using the 5-dimensional vector space as the last target position, using one bit each of information gain, motion cost and collision detection marks, using the 3-dimension as the center point of an unknown area in a 2.5D occupied height map, using 37-dimension 32-dimension potential space, information gain, collision detection marks and the center point of the unknown area in the 3-dimension occupied map as the observation space for pushing point planning in a pushing strategy stage, and adding a candidate pushing point guiding strategy for learning.

Description

Intelligent active sensing and behavior interaction method under unknown shielding scene Technical Field The invention relates to the technical field of autonomous operation and reinforcement learning of robots, in particular to an intelligent active sensing and behavior interaction method under an unknown shielding scene, which can be applied to autonomous operation of disordered shielding scenes such as bookshelf arrangement and locker cleaning. Background In practical application scenes such as a bookshelf, a locker and the like, the object layout is disordered, serious space shielding exists, and the autonomous operation of a robot needs to solve the core problems of incomplete sensing information and low action execution efficiency. The traditional viewpoint planning method only focuses on environment observation optimization and is not cooperated with operation actions, the grabbing-based operation strategy can cause larger disturbance to a scene, the success rate in a shielding environment is low, and the non-grabbing-type pushing method has flexibility, but the prior art has the following defects: Perception and behavior splitting, namely independent design of viewpoint planning and pushing strategies, and lack of closed loop collaborative optimization, so that perception information cannot effectively support action decisions; the push point selection is blind, the traditional sampling method only depends on a single geometric feature, the effective operation area in a complex scene is difficult to distinguish, and the problem of fuzzy push position exists; reinforcement learning training is unstable, namely reward signals are sparse, the problems of low exploration efficiency and slow strategy convergence are easy to occur, and the method is difficult to adapt to the dynamic change of an unknown shielding environment; The generalization capability is insufficient, the existing method is mainly provided with full observation of scenes and regular structure, and the adaptability of the existing method in real environments with multi-object shielding and noise perception is poor. Therefore, a combination method of deep fusion of active sensing and pushing strategies, scientific sampling mechanism, reasonable rewarding design and simple training is needed to improve the autonomous operation capability of the robot in an unknown shielding scene. Disclosure of Invention Aiming at the defects of the prior art, the invention aims to provide an intelligent active sensing and behavior interaction method under an unknown shielding scene, which solves the problems of insufficient sensing information, low pushing efficiency, unstable training and the like by constructing a sensing-decision-behavior closed-loop reinforcement learning framework and realizes efficient and steady non-grabbing pushing operation. In order to achieve the above purpose, the invention adopts the following technical scheme: The intelligent active perception and behavior interaction method under an unknown shielding scene is characterized in that the method is used for efficiently exploring the unknown shielding scene by a minimum invasive method by constructing a closed-loop reinforcement learning framework and combining a 2.5D occupied height diagram, a pushing point screening mechanism integrating geometric features and task heuristic weights and a dense rewarding function, so as to realize unlocking of the scene and reasonable optimization of the structure, and comprises the following steps: Step 1, firstly, a robot carries an RGB-D camera to reach a plurality of preset viewpoints to collect scene data, and a 2.5D occupation height diagram which is a core part of a reinforcement learning observation space is obtained and is used as core data for learning a viewpoint planning strategy of an active perception agent and a pushing strategy of a behavior interaction agent; And 2, training a viewpoint planning strategy of the active perception agent through a reinforcement learning TQC algorithm based on the initial 2.5D occupancy height map constructed in the step 1. Generating a next viewpoint position in the space by using a viewpoint planning strategy, and enabling the robot to adjust and observe the viewpoint to obtain new scene data, and continuously and incrementally updating the new scene data into the 2.5D occupancy height map until the maximum visible range is obtained; Step 3, the 2.5D occupancy height map constructed by the active perception intelligent agent in the step 2 is further screened by adopting a pushing point screening mechanism integrating geometric features and task heuristic weights, a pushing direction which can effectively act on an object and does not cause instability of the object is determined, and a batch of candidate pushing schemes are finally obtained; Step 4, the robot learns a pushing strategy of the action interaction intelligent agent through a reinforcement learning TQC algorithm, the strategy carri