CN-122018558-A - Vehicle system distributed control method, system and device based on reinforcement learning
Abstract
The invention belongs to the field of distributed system control, and particularly relates to a vehicle system distributed control method, system and device based on reinforcement learning. The method comprises the steps of describing the cooperative and competitive dynamic characteristics of a plurality of vehicle systems by combining a second-order nonlinear dynamics model with a symbol directed graph, considering unknown nonlinear dynamics and external disturbance existing in the vehicle systems, modeling the whole system as a second-order strict feedback nonlinear system, introducing a preset time performance function for getting rid of the dependence of convergence time on initial conditions and optimizing transient performance, constructing a three-part neural network architecture, designing a control law cooperated with preset time parameters, and guaranteeing the actual preset time limitation of the plurality of vehicle systems under the influence of the unknown dynamics and the external disturbance. The invention can accurately set the convergence time, obviously reduce the calculation complexity of the control algorithm and improve the transient performance and the robustness of the vehicle system.
Inventors
- QI WENHAI
- BAI ZIQI
Assignees
- 曲阜师范大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260408
Claims (10)
- 1. The vehicle system distributed control method based on reinforcement learning is characterized by comprising the following steps of: S1, constructing a symbol directed graph based on a multi-vehicle tracking model formed by a plurality of vehicles and a leader vehicle, wherein when the connection weight in the symbol directed graph is positive, the symbol directed graph indicates that a cooperative relationship exists between the two corresponding vehicles, and when the connection weight in the symbol directed graph is negative, the symbol directed graph indicates that a competitive relationship exists between the two vehicles; S2, taking the sum of the deviation between all vehicles and the track of the leader vehicle as the tracking error, constructing a time performance function based on the expected convergence time, the initial error boundary and the steady-state error boundary; S3, based on the transformation error, combining the dynamic compensation network, the control generation network and the disturbance upper bound to obtain a virtual control law, based on the second-order feedback nonlinear model, obtaining the vehicle speed, and making a difference with the virtual speed control law to obtain a speed error, updating the dynamic compensation network, the disturbance upper bound, the performance evaluation network and the control generation network, S4, converting the actual control input into linear speed acceleration and angular speed instructions directly executed by the vehicle, driving the vehicle to track the state of a leader or the opposite number of the leader according to the cooperation or competition relationship within expected convergence time, and completing distributed control of the vehicle system.
- 2. The reinforcement learning-based vehicle system distributed control method according to claim 1, wherein the time performance function is constructed in S2 based on the expected convergence time, the initial error boundary, and the steady-state error boundary, specifically: , Wherein, the As a function of time performance, B 0 is the initial boundary, Is the steady-state error bound for the error, As a function of the shape parameter(s), For the desired convergence time, t is a time variable.
- 3. The reinforcement learning-based vehicle system distributed control method according to claim 1, wherein the virtual control law is obtained in S3 by combining a dynamic compensation network, a control generation network and a disturbance upper bound based on a transformation error, specifically: , Wherein, the In order to be a virtual control law, For a time-varying scaling factor related to a time performance function, In order to achieve the degree of penetration, For the leader to follower j traction gain, , Sgn is the sign function of the sign, For the design parameters, the value range is (0, 1), tan () is a hyperbolic tangent function, In order to be able to adjust the parameters, In order to have an unconstrained tracking error, To be in accordance with the expected convergence time The parameter of the correlation is set to be, In order to perturb the upper bound of the band, To dynamically compensate for the transpose of the network weights, As a first basis function vector of the set of data, To control the generation of a transpose of neural network weights, Is a second basis function vector.
- 4. The reinforcement learning-based vehicle system distributed control method according to claim 1, wherein the step S3 is based on a speed error, and the updated dynamic compensation network and disturbance upper bound, dynamic compensation network, control generation network, to obtain an actual control input, specifically: , Wherein, the In order to actually control the input of the device, For the first time the gain parameter is controlled, For this, tan h () is a hyperbolic tangent function, In order to be able to adjust the parameters, In order to have an unconstrained tracking error, For the second time control gain parameter, , Sgn is the sign function of the sign, The value range is (0, 1) for the design parameter, In order to have an unconstrained tracking error, For the updated perturbation upper bound, To dynamically compensate for the transpose of the network weights after updating, As a vector of the third basis function, A transpose of the neural network weights is generated for the updated control, Is a fourth basis function vector.
- 5. The reinforcement learning-based vehicle system distributed control method according to claim 1, wherein the updating of the dynamic compensation network and the disturbance upper bound, the performance evaluation network, and the control generation network in S3 is specifically: , , , , Wherein, the To update the derivative of the upper bound of the disturbance, For unconstrained tracking errors, sgn is a sign function, Is a constant parameter, which is a function of the parameter, The value range is (0, 1) for the design parameter, For the desired convergence time, m is a constant parameter, For the 1- ζ power term of the disturbance upper bound estimate, In order for the scaling factor to be a factor, For the 1+ xi power term of the disturbance upper bound estimate, In order to dynamically compensate for the network weight update rate, As a vector of the basis functions, To dynamically compensate for the 1- ζ power term of the neural network weights, To dynamically compensate for the 1+ xi power term of the neural network weights, For the weight vector of the performance evaluation network at the next sampling time, k is the discretized sampling time sequence number, proj is the projection operator, For the weight vector of the performance evaluation network at the current sampling instant, In order to sample the period of time, A neural network weight vector is generated for control of the next sampling instant, A neural network weight vector is generated for control of the current sampling instant, And As a gradient term based on the time-series differential error, Is the learning rate.
- 6. The reinforcement learning-based vehicle system distributed control method according to claim 1, wherein the actual control input is converted into a linear velocity acceleration and angular velocity instruction directly executed by the vehicle in S4, specifically: , , Wherein, the In order to actually control the input of the device, For the linear velocity acceleration in the x direction in the transformed rectangular coordinate system, For the linear velocity acceleration in the y direction in the rectangular coordinate system after transformation, In order to be a linear velocity acceleration, In order to be able to achieve an angular velocity, In order to be a line speed, Is the angle of orientation of the jth vehicle.
- 7. The reinforcement learning-based vehicle system distributed control method according to claim 1, wherein the symbol directed graph is constructed in S1, specifically: symbol directed graph with vehicle system communication topology balanced by structure Description in which adjacency matrix Elements of (2) Is a non-zero real number, and positive values represent the jth vehicle and the jth vehicle Cooperation between individual vehicles, negative values representing vehicles And (3) with Competition between them.
- 8. The reinforcement learning-based vehicle system distributed control method according to claim 1, wherein the tracking error is converted into the conversion error by a time performance function in S2, specifically: , Wherein, the In order to track the error in the tracking, As a function of the performance of the time, Is an unconstrained transformation error.
- 9. Reinforcement-learning-based vehicle system distributed control system for implementing the reinforcement-learning-based vehicle system distributed control method according to any one of the above claims 1 to 8, characterized by comprising: Constructing a symbol directed graph based on a multi-vehicle tracking model formed by a plurality of vehicles and a leader vehicle, wherein when the connection weight in the symbol directed graph is positive, the symbol directed graph indicates that a cooperative relationship exists between the two corresponding vehicles, and when the connection weight in the symbol directed graph is negative, the symbol directed graph indicates that a competitive relationship exists between the two vehicles; The error conversion module is used for taking the cooperation or competition relation between all vehicles and the neighbors thereof and the sum of the deviation between all vehicles and the tracks of the leader vehicles as tracking errors, constructing a time performance function based on expected convergence time, an initial error boundary and a steady-state error boundary, and converting the tracking errors into transformation errors through the time performance function; The control design module is used for obtaining a virtual control law based on the transformation error by combining a dynamic compensation network, a control generation network and a disturbance upper bound, obtaining the vehicle speed based on a second-order feedback nonlinear model, obtaining a speed error by making a difference with the virtual speed control law, updating the dynamic compensation network, the disturbance upper bound, a performance evaluation network and the control generation network, The command execution module converts the actual control input into linear speed acceleration and angular speed commands directly executed by the vehicle, drives the vehicle to track the state of a leader or the opposite number of the leader according to the cooperation or competition relationship within expected convergence time, and completes the distributed control of the vehicle system.
- 10. A reinforcement learning based vehicle system distributed control apparatus comprising a processor and a memory, wherein the processor implements the reinforcement learning based vehicle system distributed control method of any one of claims 1-8 when executing a computer program stored in the memory.
Description
Vehicle system distributed control method, system and device based on reinforcement learning Technical Field The invention belongs to the field of distributed system control, and particularly relates to a vehicle system distributed control method, system and device based on reinforcement learning. Background Through perception, communication and cooperative control among vehicles, the intelligent vehicle has great application potential in the fields of automatic driving transportation systems, intelligent port logistics, intelligent manufacturing shop material handling, military patrol striking and the like. The core for implementing these advanced applications is the underlying efficient cooperative control strategy, where consistency control and convoy control are the most basic and the most dominant targets of cooperative movement of the vehicles. In the prior art, vehicle control systems face a number of complexity challenges, firstly, the dynamics of the vehicle are strong in nonlinearity and tend to be influenced by unmodeled dynamics and external environmental disturbances, secondly, communication resources are usually limited, the network environment may contain antagonistic interactions, the vehicles often coexist with cooperative and competing relationships, most of the existing research results only can guarantee the asymptotic stability or limited time stability of the system, in which the time for the system state to converge to the equilibrium point is theoretically towards infinity, and in limited time stability, although the convergence time is limited, if the initial error of the system is large, the convergence time may become very long, which makes it impractical to preset an accurate stability time in practical mission planning. Disclosure of Invention The invention aims to provide a vehicle system distributed control method, system and device based on reinforcement learning. A reinforcement learning based vehicle system distributed control method comprising the steps of: S1, constructing a symbol directed graph based on a multi-vehicle tracking model formed by a plurality of vehicles and a leader vehicle, wherein when the connection weight in the symbol directed graph is positive, the symbol directed graph indicates that a cooperative relationship exists between the two corresponding vehicles, and when the connection weight in the symbol directed graph is negative, the symbol directed graph indicates that a competitive relationship exists between the two vehicles, and constructing a second-order feedback nonlinear model for each vehicle through coordinate transformation. S2, taking the sum of the cooperation or competition relationship between all vehicles and the neighbors thereof and the deviation between all vehicles and the track of the leader vehicle as tracking errors; Constructing a time performance function based on the expected convergence time, the initial error bound and the steady state error bound; The tracking error is converted into a transformation error by a time performance function. S3, based on the transformation error, combining a dynamic compensation network, a control generation network and a disturbance upper bound to obtain a virtual control law; Obtaining the vehicle speed based on the second-order feedback nonlinear model, obtaining the speed error by making difference with the virtual speed control law, updating the dynamic compensation network, the disturbance upper bound, the performance evaluation network and the control generation network, Based on the speed error, the updated dynamic compensation network, the disturbance upper bound, the performance evaluation network and the control generation network, the actual control input is obtained. S4, converting the actual control input into linear speed acceleration and angular speed instructions which are directly executed by the vehicle, driving the vehicle to track the state of a leader or the opposite number of the leader according to the cooperation or competition relationship within the expected convergence time, and completing the distributed control of the vehicle system. In S2, constructing a time performance function based on the expected convergence time, the initial error boundary and the steady-state error boundary, wherein the time performance function specifically comprises the following steps: , Wherein, the As a function of performance, B0 is the initial boundary,Is the steady-state error bound for the error,As a function of the shape parameter(s),For the desired convergence time, t is a time variable. And S3, based on the transformation error, combining a dynamic compensation network, a control generation network and a disturbance upper bound to obtain a virtual control law, wherein the virtual control law comprises the following concrete steps: , Wherein, the In order to be a virtual control law,For a time-varying scaling factor related to a time performance function,In order to achieve th