Search

CN-116774712-B - Real-time dynamic obstacle avoidance method in underactuated AUV three-dimensional environment

CN116774712BCN 116774712 BCN116774712 BCN 116774712BCN-116774712-B

Abstract

The invention belongs to the field of obstacle avoidance of underwater robots, in particular to a real-time dynamic obstacle avoidance method under an underactuated AUV three-dimensional environment, which comprises the following steps of constructing an AUV maneuverability model, and calculating the relative position and posture between the AUV and an obstacle; establishing an obstacle risk assessment model, judging risk levels of different obstacles, screening out the obstacle with the greatest threat to the AUV, building a network architecture of a dynamic obstacle avoidance system based on a depth certainty strategy gradient algorithm, designing input and output of the dynamic obstacle avoidance system to realize mapping of states to actions, inputting the obtained actions to an maneuverability model of the AUV so as to realize movement of the AUV, setting a reinforcement learning reward function of the dynamic obstacle avoidance system, building a virtual simulation environment according to an actual scene, training the dynamic obstacle avoidance system, storing the trained model, and carrying the dynamic obstacle avoidance system to realize real-time dynamic obstacle avoidance of the under-actuated AUV in a real ocean environment.

Inventors

  • YU XIN
  • YANG MINGYU
  • ZHAO BING
  • WANG XIANGBIN

Assignees

  • 中国科学院沈阳自动化研究所

Dates

Publication Date
20260512
Application Date
20230530

Claims (9)

  1. 1. The real-time dynamic obstacle avoidance method in the underactuated AUV three-dimensional environment is characterized by comprising the following steps: s1, constructing an AUV maneuverability model; S2, calculating the relative position and the posture between the AUV and the obstacle through the obstacle information detected by sonar; S3, establishing an obstacle risk assessment model according to the relative position and the gesture between the AUV and the obstacle obtained in the step S2, judging the risk grades of different obstacles, and screening the obstacle with the greatest threat to the AUV; the step S3 specifically includes: 3-1) classifying static obstacle and dynamic obstacle by judging whether the coordinates of the obstacle change To represent an obstacle Is characterized by: ; 3-2) constructing a risk assessment model of the obstacle to judge the risk level of the obstacle according to the distance between the AUV and the target point Relative pitch angle And relative to the heading angle Then the reference sign is Risk level of obstacle of (2) Expressed as: ; Wherein, the 、 、 、 、 、 Are all proportionality coefficients greater than zero, A safety threshold representing the distance between the AUV and the obstacle surface, Representing AUV and obstacle Is used for the distance of (a), Representing an obstacle Is used for the radius of the (c) for the (c), Is a custom function; And Refers to AUV and the first Relative pitch and relative heading angles between the individual obstacles; the above is simplified into: ; Wherein, the Defined as a general term of art, Defined as conditional terms; 3-3) when the sonar detects that a plurality of obstacles exist near the AUV, acquiring the risk levels of different obstacles, and comparing the risk levels of the obstacles Screening out The largest obstacles are: ; Wherein, the The obstacle which corresponds to the current obstacle is the obstacle which threatens AUV most; s4, building a network architecture of a dynamic obstacle avoidance system based on a depth deterministic strategy gradient algorithm; S5, designing input and output of a dynamic obstacle avoidance system, transmitting state information to a network architecture of the dynamic obstacle avoidance system to obtain action output of the system, constructing an end-to-end dynamic obstacle avoidance system with state input to the action output, and realizing mapping of the state to the action; s6, setting a reinforcement learning rewarding function of the dynamic obstacle avoidance system; And S7, building a virtual simulation environment according to an actual scene, training a dynamic obstacle avoidance system, storing a trained model, and realizing real-time dynamic obstacle avoidance of the underactuated AUV in a real ocean environment by carrying the dynamic obstacle avoidance system.
  2. 2. The method for real-time dynamic obstacle avoidance in an underactuated AUV three-dimensional environment according to claim 1, wherein the step S1 specifically comprises: constructing an operability model of the AUV, neglecting roll, wherein the AUV is composed of five motion forms of advancing and retreating, traversing, heave, pitching and yawing in a three-dimensional space, and the position and the posture of the AUV are as follows: ; Wherein, the Representing the position coordinates in the geodetic coordinate system, And (3) with Respectively representing a pitch angle and a heading angle; The AUV speed is determined by A representation; Wherein, the Representing the longitudinal speed, transverse speed and vertical speed of the AUV respectively, And (3) with Respectively representing the pitch angle speed and the yaw angle speed; the nonlinear kinematics and kinetic equation of AUV are expressed as: ; ; Wherein, the Is an AUV space motion five-degree-of-freedom coordinate transformation matrix; representing an inertia matrix of the device, In the form of a coriolis force matrix, Representing the damping matrix and, In order to restore the force matrix, Representing control force, moment, matrix for underactuated AUV type Expressed as: ; Wherein the method comprises the steps of 、 、 Representing the longitudinal thrust, pitch moment and yaw moment of the AUV, respectively.
  3. 3. The method for real-time dynamic obstacle avoidance in an underactuated AUV three-dimensional environment according to claim 1, wherein the step S2 comprises the steps of: 2-1) acquiring position information of an obstacle in a geodetic coordinate system by using AUV-mounted forward-looking sonar and side-scan sonar ; 2-2) Calculating the relative position relation between the AUV and the obstacle according to the position coordinates of the AUV and the position coordinates of the obstacle, namely: ; Wherein, the Representing a position vector directed to the obstacle by the AUV, The coordinates of the representative vector are represented, Representing the position coordinates in the geodetic coordinate system, And (3) with Respectively representing a pitch angle and a heading angle; 2-3) pointing to the obstacle according to the AUV position vector Obtaining distance between AUV and obstacle, i.e. Is a die length of: ; 2-4) pointing to the obstacle according to the AUV position vector By means of a coordinate transformation matrix Obtaining a relative position vector Projection onto AUV's own coordinate system The method comprises the following steps: ; 2-5) according to projection Acquiring relative pose between AUV and obstacle, wherein the relative pose comprises relative pitch angle With respect to the heading angle The method comprises the following steps: ; ; 2-6) based on the position coordinates of the target point Obtaining the distance between the AUV and the target point according to the steps 2-1) -2-5) Relative pitch angle And relative to the heading angle 。
  4. 4. The method for real-time dynamic obstacle avoidance in an underactuated AUV three-dimensional environment according to claim 1, wherein the risk level of different obstacles is determined, specifically: a. When the AUV is able to detect an obstacle, The value is always given to the value, Distance to AUV and obstacle Relative pitch angle The relative heading angle is The value of the risk grade is reduced along with the increase of the distance between the AUV and the obstacle, the relative pitch angle and the relative heading angle, and the risk grade is maximum when the following conditions are met; ; b. In the obstacle risk assessment model, condition terms Expressed as: ; condition item In the process, the The threat of the dynamic obstacle is judged to be high relative to the static obstacle according to the dynamic and static characteristics of the obstacle, so that the risk level of the dynamic obstacle is increased; when the obstacle is a static obstacle, Is 0; when the obstacle is a dynamic obstacle, The value of (2) is ; C. Condition item In (1) In, its custom function Expressed as: ; When (when) In the time-course of which the first and second contact surfaces, The term value is ; When (when) In the time-course of which the first and second contact surfaces, The value of this term is: ; determining when the distance between the AUV and the obstacle exceeds a set safety threshold and the radius of the obstacle When the sum is to be added up, Is constant, does not affect the risk level of each obstacle, and as the distance between the AUV and the obstacle decreases, when less than the sum of the threshold and the obstacle radius, The value of (2) increases as the distance between the two decreases; When (when) At this time The AUV is regarded as a particle, the obstacle is regarded as a sphere, the distance between the AUV and the surface of the object is the radius of the obstacle, and the AUV and the obstacle are collided, so that the risk level is maximum.
  5. 5. The method for real-time dynamic obstacle avoidance in an underactuated AUV three-dimensional environment according to claim 1, wherein the depth deterministic strategy gradient algorithm is an AC framework-based algorithm; the AC framework comprises a strategy network Actor and a value network Critic, wherein the strategy network Actor and the value network Critic are respectively provided with an estimation network and a target network.
  6. 6. The method for real-time dynamic obstacle avoidance in an underactuated AUV three-dimensional environment according to claim 1, wherein the step S4 specifically comprises: 4-1) the input of the strategy network Actor is the state of the AUV, and the output is the action; For an estimated network in the policy network Actor, it is expressed as: , wherein, The representative of the policy is that, And (3) with Representing the state and the action at the current moment respectively, Representing parameters of the estimated network in the Actor; for a target network in a policy network Actor, the input is the state of the next moment Output is the action at the next time Expressed as: , wherein, Representing a target network parameter; the OU noise is introduced into the estimation network in the policy network Actor to increase the randomness of the action, namely: ; Wherein, the Represents OU noise; 4-2) Critic fitting of a cost function over a cost network Evaluating an action performed by the AUV, wherein the input to the value estimation network is And (3) with The output is The input of the value target network is 、 Output of ; 4-3) A memory bank is arranged in the network structure, and the interaction data of the AUV and the environment are obtained when each training is performed Stored in a memory bank, and N sample updating network parameters are randomly extracted from the memory bank in the updating process And ; 4-4) Value function based on value network Critic fit Updating the strategy network, and outputting the value network Critic by gradient rise The value is maximized, then the gradient is expressed as: ; Wherein, the Representing a gradient sign; 4-5) updating own network parameters by calculating the time sequence difference of the value network Critic, then calculating the mean square error, and using gradient descent to minimize the objective function Wherein the objective function is minimized Expressed as: ; Wherein, the Representing a discount factor; 4-6) updating target network parameters in the strategy network Actor and the value network Critic in a soft update mode And The method comprises the following steps: ; ; Wherein, the Representing the running average coefficient, can influence And The magnitude of the update; 4-7) completing establishment of a strategy network Actor and a value network Critic network framework in the AUV dynamic obstacle avoidance system.
  7. 7. The method for real-time dynamic obstacle avoidance in an underactuated AUV three-dimensional environment according to claim 1, wherein the method is characterized in that the input and output of the dynamic obstacle avoidance system are designed, and an end-to-end model of the state input to the action output is constructed, and the mapping between the two is realized, specifically: 5-1) the input of the dynamic obstacle avoidance system comprises the position information and the gesture information of the AUV And speed information of AUV ; Wherein, the Representing the position coordinates in the geodetic coordinate system, And (3) with Respectively representing a pitch angle and a heading angle; representing the longitudinal speed, transverse speed and vertical speed of the AUV respectively, And (3) with Respectively representing the pitch angle speed and the yaw angle speed; 5-2) finding out the most threatening obstacle according to the obstacle risk assessment model Then, the obstacle is characterized by the dynamic and static state Distance from AUV Relative pitch angle Relative heading angle Radius of obstacle Risk level of the obstacle All are input into a dynamic obstacle avoidance system, and meanwhile, the obtained distance between the AUV and the target point is obtained Relative pitch angle And relative to the heading angle And the input of the dynamic obstacle avoidance system is also used, and the total input of the dynamic obstacle avoidance system is as follows: ; 5-3) will After normalization processing is carried out as a whole, the input state at the last moment is stacked through models Input state with current time Inputting the data into a dynamic obstacle avoidance system as a whole; 5-4) controlling the motion value output by the dynamic obstacle avoidance system to be Performing mathematical transformation on the model to output an action matched with the AUV model; 5-5) implementing a state based on a dynamic obstacle avoidance system To action Is a mapping of (1), namely: ; 5-6) the AUV completes the obstacle avoidance task according to the action output by the dynamic obstacle avoidance system.
  8. 8. The method for real-time dynamic obstacle avoidance in an underactuated AUV three-dimensional environment according to claim 1, wherein the step of setting a reinforcement learning reward function according to the input and the output of the dynamic obstacle avoidance system and the design of the dynamic obstacle avoidance system is specifically as follows: 6-1) the AUV needs to reach the designated target position during the task execution, thus setting up a terminal rewarding item, and obtaining rewards after the AUV reaches the designated position, expressed as: ; when an AUV collides with an obstacle, its penalty is given, expressed as: y ; 6-2) based on the distance between the AUV and the target point Relative pitch angle Relative heading angle Setting the rewarding items as follows: ; Wherein, the So that the AUV reduces the distance to the target point, Guiding the AUV to adjust the posture of the AUV so as to lead the AUV to navigate towards the target position; 6-3) decomposing the obstacle avoidance task into a threat to the AUV caused by the obstacle according to the obstacle risk assessment model, and calculating a risk value And (3) performing transformation: ; Wherein y is And (3) with Respectively representing the risk grade of the obstacle with the greatest threat at the last moment and the current moment, which are obtained through the obstacle risk assessment model; 、 、 All are proportionality coefficients greater than zero; a safety threshold representing a distance between the AUV and the obstacle surface; If the maximum obstacle risk level at the current moment is smaller than the maximum obstacle risk level at the previous moment, giving the AUV a positive reward value, otherwise giving the AUV a negative penalty; 6-4) finally, setting the reinforcement learning reward function is expressed as: 。
  9. 9. The method for real-time dynamic obstacle avoidance in an underactuated AUV three-dimensional environment according to claim 1, wherein the step S6 specifically comprises: (1) Starting training and initializing a network; (2) Judging whether the current training round number is smaller than the maximum round number or not by the circulation condition, if so, executing the step (3), otherwise, ending the training, and executing the step (9); (3) Initializing AUV position, attitude, speed, output force/moment, target point position, obstacle position and obstacle movement state; (4) According to the dynamic obstacle avoidance system, the AUV is based on the state of the current input Selection action The environment gives its prize value Then in the next state ; (5) Will be in the next state Assigning to current state Preparing for the next cycle; (6) Judging whether a network updating step of the dynamic obstacle avoidance system is achieved, if so, executing the step (7), otherwise, skipping, and directly executing the step (8); (7) Extracting a sample from the memory library, and updating a network of the dynamic obstacle avoidance system; (8) Performing loop judgment, namely judging whether a loop termination condition is reached (reaching a target point, touching an obstacle, and enabling the current step number to reach the maximum step number in a loop), if so, starting training of the next loop, returning to the step (2), otherwise, starting the next time step, and returning to the step (4); (9) After training, saving a trained network model of the dynamic obstacle avoidance system, and judging whether training is successful or not by checking the task completion condition and the rewarding value curve trend; If the system has successfully reached the target point and avoids various static and dynamic obstacles, the AUV can dynamically avoid the obstacle in real time in a real marine environment.

Description

Real-time dynamic obstacle avoidance method in underactuated AUV three-dimensional environment Technical Field The invention belongs to the field of obstacle avoidance of underwater robots, and particularly relates to a real-time dynamic obstacle avoidance method in an underactuated AUV three-dimensional environment. Background The ocean has abundant resources, so that the development and the utilization of the ocean are increased in various countries in the world in recent years, however, most of equipment on both people and land cannot be operated in an underwater environment because the ocean environment is complex and changeable and is severe. Autonomous underwater robots (autonomous underwater vehicle, AUV) are widely used in underwater tasks due to their high maneuverability, autonomy, safety, etc. In an underwater environment, the AUV obstacle avoidance depends on not only the motion characteristics of the AUV, namely the motion and dynamics constraint, but also environmental factors, besides the known topography and static obstacle, the AUV cannot obtain complete environmental information before a task due to the complexity and unpredictability of the underwater environment, unknown obstacles can be encountered in the course of navigation, and in the obstacles, the dynamic obstacle is difficult to predict due to the motion state and large kinetic energy, so that the risk is high when the AUV collides with the AUV, and the safety of the AUV is seriously threatened. Based on the above requirements, the AUV must have real-time obstacle avoidance capability to ensure self-safety and improve task efficiency. Therefore, a real-time dynamic obstacle avoidance system is needed, so that the AUV can cope with complex and changeable underwater environments, avoid static and dynamic obstacles encountered in the sailing process, and ensure the smooth progress of tasks. Most of the existing methods are only applied to two-dimensional planes, and only three-dimensional space is decoupled, and the existing methods are still treated as two-dimensional planes in the actual process, so that the obstacle avoidance problem in the three-dimensional environment is not really considered, and the obstacle avoidance efficiency is affected; meanwhile, the obstacle scene considered by the existing method is simpler, most of the obstacle scenes are static obstacles, the related information of the obstacles is acquired in advance, and the obstacle avoidance capability is limited; in the method, after dynamic obstacles are considered, the obstacles are only set in a simple motion state, a trained model has contingency and no universality, and meanwhile, when a plurality of obstacles exist near the AUV, the system is difficult to make an optimal strategy due to overlarge state dimension and lower model searching capability. Disclosure of Invention The invention aims to provide a real-time dynamic obstacle avoidance method under an underactuated underwater robot three-dimensional environment, which can enable an AUV to avoid static and dynamic obstacles in real time in the process of three-dimensional underwater environment operation, when a plurality of obstacles exist near the AUV, the risk level of each obstacle is calculated through a risk assessment model, so that the obstacle with the greatest threat to the AUV can be screened out, the obstacle avoidance capability is greatly improved, a state input end-to-end model of action output is constructed by using a deep reinforcement learning algorithm, and the AUV can finish decision making based on information acquired by a sensor so as to realize autonomous obstacle avoidance, thereby overcoming the defects of the underwater robot in the prior art. The technical scheme adopted by the invention for realizing the purpose is that the real-time dynamic obstacle avoidance method in the underactuated AUV three-dimensional environment comprises the following steps: s1, constructing an AUV maneuverability model; S2, calculating the relative position and the posture between the AUV and the obstacle through the obstacle information detected by sonar; S3, establishing an obstacle risk assessment model according to the relative position and the gesture between the AUV and the obstacle obtained in the step S2, judging the risk grades of different obstacles, and screening the obstacle with the greatest threat to the AUV; s4, building a network architecture of a dynamic obstacle avoidance system based on a depth deterministic strategy gradient algorithm; S5, designing input and output of a dynamic obstacle avoidance system, transmitting state information to a network architecture of the dynamic obstacle avoidance system to obtain action output of the system, constructing an end-to-end dynamic obstacle avoidance system with state input to the action output, and realizing mapping of the state to the action; s6, setting a reinforcement learning rewarding function of the dynamic obstacle avoidance