CN-115903771-B - Method for searching and navigating nearest point of inspection robot based on reinforcement learning

CN115903771BCN 115903771 BCN115903771 BCN 115903771BCN-115903771-B

Abstract

The invention provides a method for searching and navigating a nearest point of a routing inspection robot based on reinforcement learning, which comprises the steps of 1, setting two routes A and B of the routing inspection robot to be inspected, 2, obtaining initial position coordinates of the current routing inspection robot, starting a laser radar, 3, obtaining the current yaw angle of the routing inspection robot, enabling the routing inspection robot to walk for m seconds, obtaining the position coordinates of the routing inspection robot at the moment, unifying a coordinate system, 4, selecting a nearest path point as an initial point of tracking, 5, connecting the initial point with a next path point to obtain an angle relative to an imu coordinate system, controlling the movement of the routing inspection robot point to the point, and 6, starting the routing inspection route B when the routing inspection of the routing inspection route A is completed, and selecting the nearest path point from the route B as the initial point of routing inspection by using a reinforcement learning algorithm. The method has higher efficiency and precision, and greatly reduces the calculation complexity in the prior art.

Inventors

MA XINYE
ZENG QINGXI
SONG YUXIN
WANG RONGCHEN
HU YIXUAN

Assignees

南京航空航天大学

Dates

Publication Date: 20260505
Application Date: 20220919

Claims (3)

1. The method for searching and navigating the nearest point of the inspection robot based on reinforcement learning is characterized by comprising the following steps: Step 1, setting two routes A and B of the inspection robot to be inspected; step 2, obtaining initial position coordinates (x 1, y 1) of the current inspection robot through a vehicle-mounted GNSS-RTK, and starting a laser radar; step 3, the vehicle-mounted imu inertial measurement unit acquires the current yaw angle of the inspection robot, the inspection robot directly walks for m seconds, the position coordinates (x 2, y 2) of the inspection robot at the moment are acquired, and the angle difference between the RTK and the imu coordinate system is calculated to be arctan ((y 2-y 1)/(x 2-x 1)), so that the coordinate system is unified; Step 4, selecting a nearest path point as an initial point of tracking by using a reinforcement learning algorithm; Step 4 comprises: step 4-1, establishing and initializing an empty database D with the capacity N to store data and tracks in the optimization process; step 4-2, setting training times as H; step 4-3, establishing and initializing a neural network Q, which is used for approximating the action cost function of the nearest path point selected at present and randomly generating a parameter w1 of the neural network; step 4-4, establishing and initializing another neural network T with the same structure as Q, and setting the same parameters w2=w1 as Q by using the target action value as the basis of the current neural network Q network optimization; step 4-5, obtaining a state value of the current moment of the inspection robot by using the vehicle-mounted RTK, wherein the state value is S t ,S t to represent the current position; step 4-6, passing the nearest path point approximation process through the time interval M time points scattered to equal intervals; step 4-7, returning to the step 4-5 to continue the training process until the training task for H times is completed; Step 4-6-1, acquiring surrounding environment point cloud data by using a laser radar, judging that the distance data returned by the laser is less than or equal to N 1 meters as obstacle point cloud, carrying out clustering treatment on the obstacle point cloud, and calculating an acquired clustering result to obtain a region range boundary through which the inspection robot can safely pass; Step 4-6-2 obtaining a left yaw angle relative to the body position through the obstacle range boundaries Yaw angle to the right Respectively calculating the yaw angles of the current position S t and the path coordinate point , A yaw angle representing the current position S t and the kth coordinate point of the path; Will be Respectively with 、 Compare and will , Put the path point of (a) into constraint set E, if the laser radar does not detect an obstacle ; Step 4-6-3, using a greedy strategy on the constraint set Searching and generating the nearest path point of the current selection T takes a value of 1~M; Step 4-6-4, using PID algorithm, combining the current location S t with the nearest waypoint Connecting the two points to obtain an angle relative to an imu coordinate system, taking the angle as a control quantity of pid, and controlling the point-to-point movement of the inspection robot for a period of time The value of received reward report is r t , r t , which is the reciprocal of Euclidean distance between the current position and the selected nearest path point, r t = And receiving a new value of the position state through the vehicle-mounted RTK as S t+1 ; Step 4-6-5, collecting history data , R t ,S t+1 ) into the database D; step 4-6-6, randomly extracting a sample D j （S j from the database D, R j ,S j+1 ), j takes on the value 1~M and j+.t; step 4-6-7, sample d j （S j , R j ,S j+1 ) input neural network Q generates current value Q is Is a function of (2); step 4-6-8, simultaneously sample d j （S j , R j ,S j+1 ) input neural network T generates target value y j for optimization of target supervisory neural network Q; Step 4-6-9, calculating y j and Is the difference of (2) Solving for Gradient of (2) ); Step 4-6-10, updating the parameter w1 by using a gradient descent method, ) Wherein Is the learning rate; step 4-6-11, updating the neural network T parameter every C steps, and enabling w2=w1 and C to be a positive real number; Step 5, connecting the initial point with the next path point by using a PID algorithm to obtain an angle relative to an imu coordinate system, and controlling the movement of the inspection robot from point to point by taking the angle as the control quantity of the PID; and 6, when the inspection of the inspection route A is completed, starting to inspect the route B, and selecting the nearest path point from the route B as an initial point of the inspection by using a reinforcement learning algorithm.
2. The method according to claim 1, wherein in step 1, the remote control inspection robot passes through a route a and a route B to be inspected respectively, and the longitude and latitude (C 1 ,D 1 ),(C 2 ,D 2 ), (C 3 ,D 3 ),…, (C j ,D j ), of the inspection robot in the route B are recorded by the vehicle-mounted RTK in the longitude and latitude (L 1 ,A 1 ),(L 2 ,A 2 ), (L 3 ,A 3 ),…, (L i ,A i ), of the route a, wherein L i ,A i represents the longitude and latitude of the ith point in the route a respectively, and C j ,D j represents the longitude and latitude of the jth point in the route B respectively.
3. The method of claim 2, wherein in step 1, the latitude and longitude (L 1 ,A 1 ),(L 2 ,A 2 ), (L 3 ,A 3 ),…, (L i ,A i ) are converted to coordinates (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), … ,(x i , y i ), in a planar rectangular coordinate system using gaussian projection, wherein (x i , y i ) is (L i ,A i ) to coordinates in a planar rectangular coordinate system; Converting longitude and latitude (C 1 ,D 1 ),(C 2 ,D 2 ), (C 3 ,D 3 ),…, (C j ,D j ) to coordinates (g 1 , h 1 ), (g 2 , h 2 ), (g 3 , h 3 ), … , (g j , h j ), in a plane rectangular coordinate system by using gaussian-luer projection, wherein (g j , h j ) is (C j ,D j ) to coordinates in the plane rectangular coordinate system; the coordinates are then stored in a collection Is a kind of medium.

Description

Method for searching and navigating nearest point of inspection robot based on reinforcement learning Technical Field The invention belongs to the technical field of computers, and particularly relates to a method for searching and navigating a nearest point of a routing inspection robot based on reinforcement learning. Background Along with the acceleration of industrialization steps, a great number of robot technologies gradually replace manual work, and more robots are used for safety, security and other functions to maintain social stability and the safety of people. The intelligent inspection robot has very wide application scenes, including data centers, parks, chemical enterprises, transformer substations and the like. In the inspection robot inspection navigation based on the RTK signal, the route to be inspected is not fixed. In the next inspection task, the initial position of the vehicle is not necessarily set on the path point in the route, and the path point closest to the current position of the vehicle needs to be found and is used as a starting point for tracking. If an obstacle is encountered in the process of changing the tracking route, the obstacle needs to be avoided and the nearest point after the obstacle is selected in a self-adaptive mode. The above scenario involves the problem of giving a geographical location and then finding the neighbors of that location. At present, the nearest site searching method based on the geographic position mainly comprises a whole-network traversal method and a grid dividing method. The whole network traversal method is to calculate the distance between the input site and all sites of the whole network, the minimum distance is the nearest site, and the calculated distance times are equal to the number of sites of the whole network. The whole network traversal method must calculate the distance between the input point and the whole network site, if the number of the whole network sites is N, the complexity of the algorithm is linear complexity O (N), and the efficiency is low. The grid dividing method is to divide the whole grid into a plurality of grids, each grid comprises a plurality of sites, firstly calculate the grids to which the input points belong and the adjacent grids, and then apply the whole-grid traversing method to obtain the nearest sites in the range of the grids to which the input points belong and the adjacent grids, wherein the distance calculation times are equal to the number of the sites of the grids to which the input points belong and the adjacent grids. The algorithm performance of the grid division method is related to the size of grid division, has no universal applicability, and has low efficiency when all stations are still traversed in the range of the belonging grid and the adjacent grids. Because the distance calculation times of the grid dividing method are equal to the number of stations of the affiliated grid and the adjacent grid, the complexity of the algorithm is still linear complexity O (N). In view of this, how to provide a nearest point searching method based on geographic location, so as to reduce the complexity of calculation in the prior art, improve the working efficiency, save the operation time, and have universal applicability, which is a technical problem to be solved at present. Reinforcement learning is used to describe and solve the problem that an agent maximizes returns or achieves a specific goal by learning strategies during interactions with an environment, and if a certain behavior strategy of an agent results in a positive reward (reinforcement signal) to the environment, the agent's tendency to generate such behavior strategy later is reinforced. The goal of the agent is to find the optimal strategy at each discrete state to maximize the desired discount rewards and. The reinforcement learning is combined with the inspection robot, so that the robot can find the nearest path point in a shorter time, and the inspection efficiency is improved. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a method for searching and navigating the nearest point of a routing inspection robot based on reinforcement learning, which comprises the following steps: Step 1, setting two routes A and B of the inspection robot to be inspected; Step 2, obtaining the initial position coordinates (x 1, y 1) of the current inspection robot through a vehicle-mounted GNSS-RTK (global navigation satellite system Global Navigation SATELLITE SYSTEM is also called as a global navigation satellite system Global Navigation SATELLITE SYSTEM, which is called as GNSS for short; RTK is Real-TIME KINEMATIC and represents a Real-time dynamic positioning technology based on a carrier phase observation value), and starting a laser radar; Step 3, acquiring a current yaw angle of the inspection robot by a vehicle-mounted imu (Inertial Measurement Unit, an inertial measurement unit), wherein the inspect