CN-121995737-A - Multi-agent system formation control method and device based on fuzzy reinforcement learning

CN121995737ACN 121995737 ACN121995737 ACN 121995737ACN-121995737-A

Abstract

The invention provides a multi-agent system formation control method and device based on fuzzy reinforcement learning, and relates to the technical field of multi-agent system formation control. The method comprises the steps of establishing a communication topology of the multi-agent system based on graph theory, determining a neighbor set of each agent according to the communication topology, wherein the neighbor set comprises each neighbor agent of the agent, determining formation errors according to position state information of each agent, position state information of the neighbor agents of each agent and expected tracks of piloting agents, determining a first fuzzy basis function and a second fuzzy basis function according to the formation errors, and approximately approaching a solution of an HJB equation based on the first fuzzy basis function, the second fuzzy basis function and an identifier-actor-critter reinforcement learning algorithm to obtain an optimal controller of the multi-agent system. The method can overcome the solving difficulty of the analytic solution of the HJB equation and improve the convergence accuracy and rate of the solving process.

Inventors

LI SHAOBAO
XU SHIJI
ZHANG YUGUANG
LUO XIAOYUAN

Assignees

燕山大学

Dates

Publication Date: 20260508
Application Date: 20260202

Claims (10)

1. A multi-agent system formation control method based on fuzzy reinforcement learning is characterized by comprising the steps of establishing a communication topology of a multi-agent system based on graph theory, and determining a neighbor set of each agent according to the communication topology, wherein the neighbor set comprises all neighbor agents of the agent; Determining formation errors according to the position state information of each intelligent agent, the position state information of the neighbor intelligent agent of each intelligent agent and the expected track of the piloting intelligent agent; introducing an optimal control strategy aiming at the formation error, obtaining a first optimal value function expression and an optimal virtual controller expression which minimize a first value function by solving an HJB equation, determining a first fuzzy base function according to the formation error so as to approximate a system uncertainty item and an estimated error compensation item after gradient decomposition of the first optimal value function expression; The method comprises the steps of determining a virtual error according to speed state information of each intelligent agent and an approximation value of an optimal virtual controller, introducing an optimal control strategy for the virtual error, obtaining a second optimal value function expression and an optimal controller expression which enable a second value function to be minimum by solving an HJB equation, determining a second fuzzy base function according to the formation error to approximate a system uncertainty item and an estimated error compensation item after gradient decomposition of the second optimal value function expression, optimizing according to the second fuzzy base function expression and an identifier-actor-critter reinforcement learning algorithm according to the form after gradient decomposition and approximation of the second optimal value function expression, obtaining an approximation value of the optimal controller expression according to an optimization result, determining a disturbance observer according to the approximation value of the optimal virtual controller, and obtaining an optimal controller of a multi-intelligent agent system according to the approximation value of the optimal controller expression and the disturbance observer.
2. The multi-agent system formation control method based on fuzzy reinforcement learning of claim 1, wherein the first fuzzy base function includes a first recognizer fuzzy base function and a first actor-critter fuzzy base function; Determining a first fuzzy basis function from the formation error, comprising: calculating a norm of the formation error; according to the norm, according to Determining a switching coefficient; According to the switching coefficient, the first preset identifier fuzzy base function and the second preset identifier fuzzy base function, according to the following steps Determining a first identifier fuzzy basis function; according to the switching coefficient, a first preset actor-critter fuzzy base function and a second preset actor-critter fuzzy base function, according to Determining a first actor-critic fuzzy basis function; Wherein, the For the switching coefficient to be used, As a result of the said norms, In order to set the threshold value, Blurring the basis functions for the first recognizer, Blurring the basis functions for the first predetermined identifier, , Is the first in the multi-agent system The location status information of the individual agents, Blurring a basis function center for the first predetermined identifier, Blurring a basis function width for the first predetermined identifier, Blurring the basis functions for the second predetermined identifier, , Blurring a basis function center for the second predetermined identifier, Blurring a basis function width for the second predetermined identifier, Blurring the basis function for the first actor-critter, For the first preset actor-critique fuzzy basis function, , As a first state variable, a second state variable, A blur basis function center for the first preset actor-critter, For the first preset actor-critter blur basis function width, For the second preset actor-critter fuzzy basis function, , For the second preset actor-critter fuzzy basis function center, And setting the width of the actor-critter fuzzy basis function for the second preset actor-critter fuzzy basis function.
3. The multi-agent system formation control method based on fuzzy reinforcement learning of claim 1, wherein the first optimal value function expression gradient is decomposed and approximated as: ; Wherein, the For the gradient of the first optimal value function expression, And Is two constants greater than 0, In order to provide for the said formation errors, A function designed to avoid the occurrence of singularities, Is a positive constant which is set to be a constant, Is a constant value, and is used for the treatment of the skin, , , For the first optimal identifier parameter matrix, For the first optimal reviewer parameter matrix, For a first recognizer of the first fuzzy base functions, For a first actor-critter blur basis function of the first blur basis functions, Is approximate error and meets Wherein And Is a two positive constant.
4. The multi-agent system formation control method based on fuzzy reinforcement learning according to claim 3, wherein optimizing according to the first fuzzy basis function and the recognizer-actor-critique reinforcement learning algorithm according to the form after the gradient decomposition and approximation of the first optimal value function expression, obtaining the approximation of the optimal virtual controller expression according to the optimization result, comprises: Converting the optimal virtual controller expression according to the form after the gradient decomposition and approximation of the first optimal value function expression: ; Determining an expression of an identifier-actor fuzzy logic system module according to the form of the first optimal value function expression after gradient decomposition and approximation and the converted optimal virtual controller expression, wherein the expression is as follows: ; in the formula, For an approximation of the optimal virtual controller expression, In order for the identifier to be a vector of parameters, Is an actor parameter vector; the expression of the identifier-criticizer fuzzy logic system module is determined as follows: ; in the formula, An estimate of the gradient of the first optimal value function expression, Is a commentator parameter vector; the parameter vector update law of the identifier fuzzy logic system module is as follows: ; in the formula, Is that Is used for the purpose of determining the derivative of (c), Representation of A kind of electronic device Row of lines The elements of the column are arranged such that, For the learning rate of the recognizer fuzzy logic system module, Representation of A kind of electronic device The elements of the row are arranged such that, Representing the enqueue error The elements of the column are arranged such that, And Is a positive design constant for controlling the update amplitude of the weights and avoiding excessive updates; the parameter vector update law of the critic fuzzy logic system module is as follows: ; in the formula, Is that Is used for the purpose of determining the derivative of (c), The learning rate for the reviewer fuzzy logic system module, a constant greater than 0, In order to increase the training rate, among other things, Is that A rank identity matrix; the parameter vector update law of the actor fuzzy logic system module is as follows: ; in the formula, Is that Is used for the purpose of determining the derivative of (c), The learning rate of the actor fuzzy logic system module is a constant larger than 0; Optimizing according to each parameter vector update law, determining the first optimal identifier parameter matrix and the first optimal commentator parameter matrix, substituting the first optimal identifier parameter matrix and the first optimal commentator parameter matrix into the converted optimal virtual controller expression, and obtaining an approximate value of the optimal virtual controller expression.
5. The fuzzy reinforcement learning based multi-agent system platoon control method of claim 1, wherein determining a virtual error from the speed state information of each agent and an approximation of the optimal virtual controller comprises: According to Determining a virtual error; Wherein, the In order for the virtual error to be a function of the virtual error, Is the first The speed status information of the individual agents, Is an approximation of the optimal virtual controller.
6. The multi-agent system formation control method based on fuzzy reinforcement learning of claim 1, wherein the second fuzzy base function includes a second recognizer fuzzy base function and a second actor-critter fuzzy base function; determining a second fuzzy basis function from the formation error, comprising: calculating a norm of the formation error; according to the norm, according to Determining a switching coefficient; according to the switching coefficient, the third preset identifier fuzzy base function and the fourth preset identifier fuzzy base function, according to the following steps Determining a second identifier fuzzy basis function; According to the switching coefficient, the third preset actor-critter fuzzy base function and the third preset actor-critter fuzzy base function, according to Determining a second actor-critic fuzzy basis function; Wherein, the For the switching coefficient to be used, As a result of the said norms, In order to set the threshold value, Blurring the basis functions for the second recognizer, Blurring the basis functions for the third predetermined identifier, , , , Blurring a basis function center for the third predetermined identifier, Blurring a basis function width for the third predetermined identifier, Blurring the basis functions for the fourth predetermined identifier, , Blurring a basis function center for the fourth predetermined identifier, Fuzzy basis function width for the fourth predetermined identifier, Blurring the basis function for the second actor-critter, For the third preset actor-critter fuzzy basis function, , , A blur basis function center for the third preset actor-critter, For the third preset actor-critter blur basis function width, A blur basis function for the fourth preset actor-critter, , A blur basis function center for the fourth preset actor-critter, And presetting an actor-critter fuzzy basis function width for the fourth preset actor-critter fuzzy basis function width.
7. The multi-agent system formation control method based on fuzzy reinforcement learning of claim 1, wherein determining a disturbance observer from an approximation of the optimal virtual controller includes: According to Determining a disturbance observer; in the formula, Is that Is used for the purpose of determining the derivative of (c), For the disturbance observer A kind of electronic device The elements of the column are arranged such that, And For controlling the convergence speed and stability of the estimator, to a constant greater than 0, , , Is a number of 1, and is not limited by the specification, The total number of the components is 0.7, The content of the acid in the solution is 0.01, Is that Is used for the purpose of determining the derivative of (c), Is that A kind of electronic device Elements of a column; obtaining an optimal controller for the multi-agent system from the approximation of the optimal controller expression and the disturbance observer, comprising: According to Obtaining an optimal controller of the multi-agent system; in the formula, Is an optimal controller for a multi-agent system, As an approximation of the optimal controller expression, Is the disturbance observer.
8. A multi-agent system formation control device based on fuzzy reinforcement learning, characterized by comprising: The modeling module is used for establishing a communication topology of the multi-agent system based on graph theory, and determining a neighbor set of each agent according to the communication topology, wherein the neighbor set comprises all neighbor agents of the agent; The processing module is used for determining formation errors according to the position state information of each intelligent body, the position state information of the neighbor intelligent body of each intelligent body and the expected track of the piloting intelligent body; The system comprises a first control module, a first fuzzy base function, an estimation error compensation term, a first fuzzy base function, a second fuzzy base function, a first fuzzy base function and a second fuzzy base function, wherein the first control module is used for introducing an optimal control strategy for the formation error, and obtaining a first optimal value function expression and an optimal virtual controller expression for minimizing a first value function by solving an HJB equation; The system comprises a first control module, a second control module, a disturbance observer, a multi-agent system, a multi-system controller and a multi-system controller, wherein the first control module is used for determining a virtual error according to speed state information of each agent and an approximation value of an optimal virtual controller, introducing an optimal control strategy for the virtual error, obtaining a first optimal value function expression and an optimal controller expression which enable the first value function to be minimum by solving an HJB equation, determining a first fuzzy basis function according to the formation error, approximating a system uncertainty term and an estimation error compensation term after gradient decomposition of the first optimal value function expression, optimizing according to the form after gradient decomposition of the first optimal value function expression according to the second fuzzy basis function and an identifier-actor-critter reinforcement learning algorithm, obtaining an approximation value of the optimal controller expression according to an optimization result, and obtaining the optimal controller of the multi-agent system according to the approximation value of the optimal controller expression and the disturbance observer.
9. An electronic device comprising a memory storing a computer program and a processor implementing the method of any of claims 1 to 6 when the computer program is executed by the processor.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method according to any of claims 1 to 6.

Description

Multi-agent system formation control method and device based on fuzzy reinforcement learning Technical Field The invention relates to the technical field of multi-agent system formation control, in particular to a multi-agent system formation control method and device based on fuzzy reinforcement learning. Background Over the past decade, multi-robot systems have been a typical multi-agent system, exhibiting greater fault tolerance and robustness than single robot systems due to their greater redundancy, and being able to cooperatively accomplish tasks that many single robots cannot independently accomplish. Robot formation has become an important research direction as a control strategy for achieving collaborative tasks in multi-robot systems. The pilot follower method is one of the distributed control methods, can be flexibly applied to multi-robot system formation control, and has high usability. Applying the optimal control theory to multi-robot formation control, optimization between balance control performance and resource consumption can be achieved by minimizing the cost function. Conventional optimal control is typically achieved by solving the hamilton-jacobian-bellman (HJB) equation, but the solution process of resolving the solution is extremely complex due to nonlinear factors in the equation. In order to overcome the difficulty in applying the optimal control method to formation control, there have been studies considering solving by introducing reinforcement learning and fuzzy logic systems, however, convergence accuracy and rate of the solving process still remain to be further improved. Disclosure of Invention The embodiment of the invention provides a multi-agent system formation control method and device based on fuzzy reinforcement learning, which are used for solving the problem that the convergence accuracy and rate of a solving process are to be improved. In a first aspect, an embodiment of the present invention provides a multi-agent system formation control method based on fuzzy reinforcement learning, including: Based on graph theory, establishing a communication topology of a multi-agent system, and determining a neighbor set of each agent according to the communication topology, wherein the neighbor set comprises all neighbor agents of the agent; Determining formation errors according to the position state information of each intelligent agent, the position state information of the neighbor intelligent agent of each intelligent agent and the expected track of the piloting intelligent agent; introducing an optimal control strategy aiming at the formation error, obtaining a first optimal value function expression and an optimal virtual controller expression which minimize a first value function by solving an HJB equation, determining a first fuzzy base function according to the formation error so as to approximate a system uncertainty item and an estimated error compensation item after gradient decomposition of the first optimal value function expression; The method comprises the steps of determining a virtual error according to speed state information of each intelligent agent and an approximation value of an optimal virtual controller, introducing an optimal control strategy for the virtual error, obtaining a second optimal value function expression and an optimal controller expression which enable a second value function to be minimum by solving an HJB equation, determining a second fuzzy base function according to the formation error to approximate a system uncertainty item and an estimated error compensation item after gradient decomposition of the second optimal value function expression, optimizing according to the second fuzzy base function expression and an identifier-actor-critter reinforcement learning algorithm according to the form after gradient decomposition and approximation of the second optimal value function expression, obtaining an approximation value of the optimal controller expression according to an optimization result, determining a disturbance observer according to the approximation value of the optimal virtual controller, and obtaining an optimal controller of a multi-intelligent agent system according to the approximation value of the optimal controller expression and the disturbance observer. In one possible implementation, the first blur basis functions include a first recognizer blur basis function and a first actor-critter blur basis function; Determining a first fuzzy basis function from the formation error, comprising: calculating a norm of the formation error; according to the norm, according to Determining a switching coefficient; According to the switching coefficient, the first preset identifier fuzzy base function and the second preset identifier fuzzy base function, according to the following steps Determining a first identifier fuzzy basis function; according to the switching coefficient, a first preset actor-critter fuzzy base function and