CN-121977582-A - Deep reinforcement learning path planning method based on local environment driving

CN121977582ACN 121977582 ACN121977582 ACN 121977582ACN-121977582-A

Abstract

The invention discloses a deep reinforcement learning path planning method based on local environment driving, belongs to the technical field of mobile robots, and is used for autonomous navigation and obstacle avoidance of the mobile robots in complex environments. The method comprises the steps of firstly constructing an obstacle outline model according to obstacle point information in a local environment, then constructing a local environment map structure between an obstacle and a robot by adopting an undirected map, extracting space characteristics and deep characteristic information in the local environment by utilizing a map attention network, then acquiring an optimal local target point by utilizing a local target driving mechanism, and finally fusing a depth reinforcement learning training optimal path planning strategy. According to the invention, key features in a local environment are enhanced and learned through local target point guidance and a graph annotation force network, so that the local environment perception capability of the robot is improved, the path planning capability of the robot in a complex environment is enhanced, the average path length can be obviously shortened while the navigation success rate is improved, and the track smoothness is enhanced.

Inventors

YU YONGJIN
ZHANG YIFAN
YANG HE
LIU QUAN
GAO CHANG
ZHANG GUO
LI CHAO
ZHAO YUE
JIANG YI
TIAN MIN

Assignees

山东科技大学
国网山东省电力公司宁阳县供电公司

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (9)

1. The deep reinforcement learning path planning method based on local environment driving is characterized by comprising the following steps of: Step 1, constructing an obstacle outline model according to obstacle point information in a local environment of a robot; Step 2, constructing a local environment map structure between the obstacle and the robot by adopting an undirected map based on the obstacle profile model, and extracting space characteristics and deep characteristic information in the local environment by utilizing a map attention network; step 3, based on the deep characteristic information, acquiring an optimal local target point by utilizing a local target driving mechanism; And 4, constructing a reward function, fusing the optimal local target point and the local environment characteristic into a deep reinforcement learning network, and training to generate an optimal path planning strategy.
2. The local environment-driven deep reinforcement learning path planning method according to claim 1, wherein the step 1 specifically includes: Step 1.1, obtaining discrete obstacle points and constructing an incremental local map; according to the laser radar detection light ranging data, the current position and the running direction of the robot, calculating the position coordinates of the obstacle points in all directions to obtain discrete obstacle points, superposing the position coordinates of the obstacle points detected in the historical moment to gradually construct a global map; step 1.2, performing imaging pretreatment; converting the incremental local map into a binary image, wherein a pixel value 0 represents a feasible region, and a pixel value 1 represents an obstacle region; step 1.3, profile tracking and obstacle separation; Raster scanning is carried out on the preprocessed binary image, and a transition point between an obstacle and a feasible region is used as a starting point of contour tracking; checking the value of each pixel in the eight neighborhood of the starting point, selecting a pixel point with a value of 1, adding the pixel point into the current contour, and taking the pixel point as the next tracking point to continue tracking; when the tracing points are exhausted, finishing the extraction of the current contour, continuously selecting other pixel points with the value of 1 as new starting points to trace until all the contours are extracted, distinguishing different obstacle contours according to the connectivity of the pixels, dividing all the obstacles into independent obstacle individuals, and obtaining a point set of each obstacle contour; step 1.4, convex hull calculation and key point extraction; The method comprises the steps of constructing a rectangular coordinate system by taking a point with the smallest ordinate as an origin point for a point set of each obstacle outline, calculating the amplitude angle of each point relative to the origin point, sequencing the point sets according to the sequence from small amplitude angle to large amplitude angle, initializing an empty stack, sequentially processing each point according to the sequence after sequencing, checking the corner direction formed by the last two points and the current point in the stack, pushing the current point into the stack if the corner is anticlockwise, popping up the point at the top of the stack if the corner is clockwise, continuously checking the corner direction between the new two points and the current point until the anticlockwise corner is formed, pushing the current point into the stack, traversing all points, taking the rest points in the stack as convex hull vertexes of the obstacle outline, and constructing an obstacle outline model according to the key points.
3. The local environment driving-based deep reinforcement learning path planning method according to claim 2, wherein the step 2 specifically comprises: Step 2.1, defining robot node characteristics and obstacle node characteristics; Step 2.2, constructing an isomerism diagram structure; The node characteristic matrix of the different composition is composed of robot nodes The obstacle nodes are spliced; the heterographic edge connection exists only between the robot node and each obstacle node; The method comprises the steps of obtaining a graph structure through topological relation of nodes and edges, wherein the graph structure is used for reflecting relation between a robot and an obstacle, carrying out dimension unification processing on node characteristics of different patterns of a multi-layer perceptron to obtain a node characteristic matrix with unified dimensions, and inputting the node characteristic matrix into a graph semantic network for processing, and outputting the node characteristics comprising deep environment perception, wherein the node characteristic matrix is obtained in the step 2.3.
4. A method for deep reinforcement learning path planning based on local environment driving according to claim 3, wherein in step 2.4, the processing procedure of the graph annotation force network comprises: step 2.4.1 computing node With adjacent nodes Attention coefficient between ; And 2.4.2, weighting and aggregating the characteristic information of the adjacent nodes based on the attention coefficient.
5. The local environment-driven deep reinforcement learning path planning method according to claim 3, wherein the step 3 specifically comprises: step 3.1, performing expansion processing on the outline of the obstacle to generate a local target candidate point set, step 3.2, removing invalid and unreachable candidate points to obtain an effective candidate point set, and step 3.3, calculating the cost distance from the current position to the global target point through the effective candidate point set; Based on the principle of minimum cost distance, selecting an optimal local target point from the effective candidate point set; step 3.4, determining a final target point according to the target point selection condition; The target point selection strategy is that when the global target point is in the detection range and no obstacle is blocked between the global target point and the robot, the global target point is directly selected as the current target, otherwise, the optimal local target point obtained through calculation is selected.
6. The method for planning a deep reinforcement learning path based on local environment driving according to claim 5, wherein in step 3.4, when the global target point is present in the detection range of the robot and no obstacle is present between the robot and the global target point, the local target point is not set any more, and the robot directly travels to the global target point.
7. The local environment driving-based deep reinforcement learning path planning method according to claim 5, wherein in said step 4, a state space is defined The method comprises the following steps: (12); Wherein, the To graphically annotate environmental features extracted from a network, Is the displacement vector of the target point relative to the robot, and a defined action space The method comprises the following steps: (13); Wherein, the In order to be a line speed, Is the angular velocity.
8. The method for deep reinforcement learning path planning based on local environment driving of claim 7, wherein the constructed reward function in step 4 comprises a target trend reward Awarding rewards based on the distance variable quantity of the robot approaching to the global target point or the local target point, and awarding rewards for safety obstacle avoidance Penalty value is given when robot collides with obstacle Awarding prize values when global target points are reached Speed rewards Penalty value is given when the robot speed is below 1/4 of the maximum speed The final reward function is: (17); Wherein, the 、 And Is a scaling factor used to adjust the size of the contribution of each fractional prize.
9. The method for deep reinforcement learning path planning based on local environment driving according to claim 1, wherein in the step 4, the deep reinforcement learning network adopts an Actor-Critic architecture, and the path planning strategy is trained and generated by combining a local target driving mechanism and environmental characteristics perceived by a graph meaning network.

Description

Deep reinforcement learning path planning method based on local environment driving Technical Field The invention belongs to the technical field of mobile robots, and particularly relates to a deep reinforcement learning path planning method based on local environment driving. Background Path planning is an important technology in the field of mobile robot research, aimed at finding an optimal path from a starting point to a target point for a mobile robot while avoiding obstacles in the environment. Deep reinforcement learning (Deep reinforcement learning, DRL) algorithms combine the perceptibility of deep learning with the decision making capability of reinforcement learning, exhibiting a strong potential in path planning. The depth deterministic strategy Gradient (DEEP DETERMINISTIC Policy Gradient, DDPG) algorithm is based on the combination of value and strategy, follows an Actor-Critic architecture, can learn action strategy and strategy value simultaneously, directly outputs actions in a continuous action space, can generate smooth and accurate action sequences without discretization, and remarkably improves reinforcement learning efficiency. The method can solve the complex problems of high-dimensional state sensing and continuous action decision making, dynamically adapt to environmental changes, directly learn a better path planning strategy from the original data, and adaptively adjust the decision of path planning in real time. However, the existing environment perception system excessively abstracts the representation mode of the obstacle, only can extract shallow space features and lacks the deep analysis capability of a complex environment topological structure, so that the cognitive model constructed in the high-dimensional dynamic environment of the mobile robot has information deficiency. The characterization defect directly weakens the decision quality of the path planning system, and is characterized in that the environment adaptability algorithm has poor response delay to sudden obstacle and insufficient path re-planning flexibility. When the complexity of the environment is improved, the defects of the traditional method in terms of feature extraction dimension and space-time correlation modeling are further amplified, so that the path planning algorithm cannot meet the dual requirements of real-time obstacle avoidance and optimal path searching in diversified scenes. Disclosure of Invention In order to solve the problems, the invention provides a deep reinforcement learning path planning method based on local environment driving, which enhances and learns key characteristics in a local environment through local target point guidance and graph annotation force network perception, improves the local environment perception capability of a robot, and enhances the path planning capability of the robot in a complex environment. In order to achieve the above purpose, the present invention adopts the following technical scheme. A deep reinforcement learning path planning method based on local environment driving comprises the following steps; Step 1, constructing an obstacle outline model according to obstacle point information in a local environment of a robot; step 2, constructing a local environment map structure between the obstacle and the robot by adopting an undirected map based on the obstacle profile model, and extracting space characteristics and deep characteristic information in the local environment by utilizing a map attention network; Step 3, based on the deep characteristic information, acquiring an optimal local target point by utilizing a local target driving mechanism; and 4, fusing the optimal local target point and the local environment characteristic into a deep reinforcement learning network, and training to generate an optimal path planning strategy. Preferably, the step 1 specifically includes; Step 1.1, obtaining discrete obstacle points and constructing an incremental local map; according to the laser radar detection light ranging data, the current position and the running direction of the robot, calculating the position coordinates of the obstacle points in all directions to obtain discrete obstacle points, superposing the position coordinates of the obstacle points detected in the historical moment to gradually construct a global map; Step 1.2, performing imaging pretreatment; converting the incremental local map into a binary image, wherein a pixel value 0 represents a feasible region, and a pixel value 1 represents an obstacle region; Step 1.3, separating contour tracking from an obstacle; Raster scanning the preprocessed binary image, taking the transition point between the obstacle and the feasible region as the starting point of contour tracking, checking the value of each pixel in the eight adjacent regions of the starting point, selecting a pixel point with a value of 1 to be added into the current contour and taking the pixel point as the next tracking poi