CN-122018565-A - Four-rotor unmanned aerial vehicle fault-tolerant control method based on SAC method

CN122018565ACN 122018565 ACN122018565 ACN 122018565ACN-122018565-A

Abstract

The embodiment of the disclosure provides a fault-tolerant control method of a four-rotor unmanned aerial vehicle based on a SAC method, which aims to reduce high time delay under the active fault-tolerant control method of the four-rotor unmanned aerial vehicle and improve tracking effect and control performance of passive fault-tolerant control. The method comprises the steps of establishing a six-degree-of-freedom kinematic and dynamics mathematical model of the four-rotor unmanned aerial vehicle, designing a cascade control structure based on a PID controller, a PA controller and a strategy network, and improving a selector-controller network architecture according to a SAC algorithm interface. And designing a simulation platform on the Python platform based on Pytorch libraries and a Stable-Baselines3 architecture, and training a strategy network on the simulation platform. The method can enable the four rotors to respond and adjust the position and the posture rapidly under the condition of complete faults of the single rotor, has higher tracking performance and control performance, and has remarkable theoretical value and engineering application prospect.

Inventors

ZHU JING
XU YANXIN
TANG XIAOHAN

Assignees

南京航空航天大学

Dates

Publication Date: 20260512
Application Date: 20260114

Claims (6)

1. A four-rotor unmanned aerial vehicle fault-tolerant control method based on a SAC method is characterized by comprising the following steps: Step 1, establishing a six-degree-of-freedom kinematic and dynamic mathematical model of the four-rotor unmanned aerial vehicle; step 2, designing a high-layer-bottom-layer cascade control structure based on a PID controller, a main shaft controller and a strategy network; step 3, designing strategy network loss functions, supervised learning rules and behavior cloning based on expert experience based on a reinforcement learning SAC algorithm respectively; And 4, developing a four-rotor simulation platform based on Pytorch libraries and a Stable-Baselines3 architecture in a Python environment, training the four-rotor to reach a specified position by using a SAC algorithm, and keeping hovering under the condition of complete single-rotor faults. And after training, verifying the reliability and the robustness of the strategy in the platform.
2. The method for fault-tolerant control of a quad-rotor unmanned helicopter based on the SAC method of claim 1, wherein in step 1, considering that the fault-tolerant control strategy requires complexity of the quad-rotor unmanned helicopter system, an initial quad-rotor model with six degrees of freedom is established as shown in formula (1.1): Wherein p and v represent the position and velocity vectors of the quadrotors in the inertial coordinate system, ω= [ ω x ω y ω z ] T ] represents the angular velocity vector of the quadrotors in the body coordinate system, q= [ q ω q x q y q z ] T ] represents the unit quaternion, m and J represent the mass and moment of inertia matrices of the quadrotors, Indicating the quaternion multiplication, by which the vector is rotated by the quaternion method and the unmodeled additional moment caused by aerodynamic disturbances and model errors is indicated as T ext . The total thrust and torque provided by the rotor is shown in formula (1.2): Wherein F and tau represent thrust vectors and moments of the four rotors, T represents total thrust of the four rotors, and G is a control effectiveness matrix: Where r x,i and r y,i represent the x and y axis components of the position vector r i in the machine coordinate system and kappa t represents the torque coefficient that quantifies the proportional relationship between thrust and drag torque produced by a single rotor. Because the rotor cannot output the thrust required by the instruction in real time, the invention models the thrust of the rotor by using a first order approximation: Where σ is the rotor time constant and k i is the rotor failure coefficient. k i =0 indicates that rotor i is completely faulty, rotor stall loses lift supply, and k i =1 indicates that rotor is not faulty.
3. The method of claim 1, wherein the control architecture design of step 2 includes designing a high-level controller and a low-level controller. The advanced controller is composed of a PID controller and a spindle Axis (PA) controller which are connected in series. The PID controller is used for controlling the position of the four rotors, inputting the expected position and the current position and speed, and outputting the expected acceleration (2.1): The PA controller is used for solving the problem of underactuation of the system after the fault and is responsible for controlling the posture of the four rotors. The PA controller outputs the desired roll angle speed and desired pitch angle speed of the quad rotor and the desired acceleration in the z B direction as shown in equation (2.2): Where p des and q des represent the desired roll angle speed of the quad-rotor and its desired pitch angle speed, Indicating the desired acceleration of the quadrotor in the z B direction, A unit vector representing the total thrust direction in the volumetric coordinate system, Representation of V out represents the virtual output: Wherein, the Representing the reference principal axis. The low-level controller is mainly composed of a strategy network, and after receiving the high-level control instruction, the low-level controller outputs a rotor control instruction to the actuator, so that the thrust generated by the rotor is changed. The strategy network adopts a selector-controller network architecture, and consists of four controller networks and a selector network which are connected in parallel, wherein each controller network represents a fault condition. Each sub-network is an MLP structure with three hidden layers, and each hidden layer is 64 nodes. The observation space is composed of position p, speed v, acceleration a B , attitude quaternion q, angular speed ω, angular acceleration Δω f , previous time step command u prev , desired angular speed ω des and desired acceleration in the z B direction The controller network outputs the Gaussian distribution of control instructions for adapting the interface of the SAC algorithm, and the selector outputs the weight of the controller after being processed by the softmax function: ∑ i ω i ＝1,ω i >0 (2.4) Wherein ω i is the weight of the corresponding controller. Based on the selector weights and the controller outputs, the final control strategy is output in the form of a gaussian mixture distribution (Gaussian Mixture Model, GMM): Wherein, the Representing the output profile of the controller i, Representing the distribution of policy outputs. During the training phase, the re-parameterized techniques are used to sample from the distribution, and then the sampled values are multiplied by the standard deviation plus the mean to generate the strategy output. In contrast, in the test phase, a deterministic strategy is adopted, and the average value of the strategy is adopted as the strategy output.
4. The method of claim 1, wherein the strategic network loss function design in step 3 includes supervised learning losses, rewards functions, and behavioral cloning. The invention designs supervised learning (Supervised Learning) loss based on the mean square error (Mean Squared Error, MSE) of the selector output from the actual fault information to guide the learning process of the selector: L s (π)＝||k label -k|| 2 (3.1) Where k label represents actual quad-rotor fault information, such as k label ＝[1 1 0 1] T when a three-bit rotor fails. A bonus function (Reward Function) of the reinforcement learning SAC algorithm is used to direct the quadrotors to fly toward a target location and maintain hover during training while penalizing body oscillations in the event of rotor failure. The present invention uses expert experience to guide the early training. Incremental nonlinear dynamic inversion (INCREMENTAL NONLINEAR DYNAMIC INVERSION, INDI) method incorporating active fault-tolerant control, training time expert using known fault information and strategy not using Where f represents the variable filtered by low frequency noise, τ f and α d represent the estimated torque and the desired angular acceleration. The rotor command for each rotor can be calculated Wherein, the And (5) representing a dimension reduction control effectiveness matrix, and setting the corresponding fault rotor number array to zero in the control matrix G. By comparing expert experience driven commands with the commands output by the strategy network, behavioral clone losses can be obtained: L BC (π)＝||u des -u|| 2 (3.5) The total loss function of the policy network is defined as a weighted combination of three parts: L(π)=L s (π)+(1-α)L RL (π)+αL BC (π) (3.6) where α is an attenuation factor for adjusting the balance between expert experience and autonomous strategy learning.
5. The method of claim 1, wherein the control method comprises constructing a simulation platform based on Pytorch libraries and Stable-Baselines3 architecture in a Python environment and training with SAC algorithm. And testing various fault conditions on a simulation platform by using the trained strategy network to verify the feasibility and the robustness of the method. The control structure designed in the step 2 in the claim 1 is combined with the loss function designed in the step 3, and is applied to the initial non-simplified six-degree-of-freedom four-rotor unmanned aerial vehicle system in the step 1, and a simulation platform built in a Python environment is used for testing.
6. A fault tolerant control system for a four rotor unmanned aerial vehicle employing the method according to any one of claims 1 to 5, the system comprising: a module for building six-degree-of-freedom kinematic and kinetic models; A module for designing a control structure; Means for designing a clone loss function based on supervised learning, rewarding functions, and behavior; Simulation platform module based on Pytorch libraries and Stable-Baseline3 architecture in Python environment for training and testing.

Description

Four-rotor unmanned aerial vehicle fault-tolerant control method based on SAC method Technical Field Examples of the present disclosure relate to control technology of Unmanned aerial vehicles (un-managed AERIAL VEHICLES, UAVS), and in particular to a four-rotor Unmanned aerial vehicle single-rotor fault-tolerant control technology based on reinforcement learning (Reinforcement Learning, RL) method. Background The quad-rotor unmanned aerial vehicle (Quadrotors) is an autonomous unmanned aerial vehicle with four symmetrical rotors, and has been widely applied in various fields, from military to civilian, and covers a plurality of fields such as reconnaissance, inspection, shooting, transportation, search and rescue. The safety problem is also becoming more important, and ensuring the property and life safety in the flight mission is an important point in the industry. With the increase of applications of the quadrotor unmanned aerial vehicle in the civil field, the potential safety hazard of the quadrotor unmanned aerial vehicle is increasingly displayed, wherein rotor faults are a typical and high-risk problem. During flight, when a rotor fails (e.g., blades break, motor ages, or line faults, etc.), the lift provided by the rotor may drop to some extent or even disappear entirely. In view of the relatively limited control freedom of a four-rotor system, when the thrust performance of one rotor is severely impaired or stopped, the remaining rotors may be difficult to rebalance in thrust and torque, resulting in a pose imbalance. When a single rotor wing still maintains partial lift force, namely, under the condition of partial fault, if the residual rotor wing has sufficient lift force allowance and the system response is timely, the balance of the system can be maintained by the thrust compensation of the residual rotor wing and the control of the split weight mechanism, but if the rotor wing is completely faulty, namely, the lift force is completely lost, at the moment, a yaw channel can be completely out of control, a machine body rapidly falls into a chaotic state, and even a flight crew with abundant operation experience can not realize safe landing. Therefore, the method has very important significance for fault detection and fault tolerance control research in the four-rotor system, reducing serious accidents, improving the reliability of flight tasks and guaranteeing the stability of unmanned aerial vehicle application. Several studies have shown that unmanned aerial vehicle systems can have higher robustness and reliability during task execution through fault detection and fault tolerant control strategies. In the current unmanned aerial vehicle research and industrialization process, the safety of flight is ensured, the fault bearing capacity is enhanced, and the task adaptability is improved, so that the unmanned aerial vehicle becomes a key bottleneck for moving from a laboratory to real application. Fault-tolerant control (FTC) has become a hotspot in the field of drone control. Related studies have shown that FTCs can still maintain system performance or achieve safe landing in the event of sensor or actuator failure, which is particularly critical for wide deployment of unmanned aerial vehicles. In addition, the advanced control methods such as machine learning, neural network, active disturbance rejection control (Active Disturbance Rejection Control, ADRC) and the like are introduced into unmanned aerial vehicle fault tolerance control, and are the leading direction of current researches. If the path planning and the task continuous execution/safe return can be realized under the condition of rotor fault, the reliability and the availability of the unmanned aerial vehicle in application scenes such as disaster relief, logistics transportation, agricultural inspection and the like can be obviously improved, so that the method has positive influence on social, economic, military and the like. Reinforcement learning (Reinforcement Learning, RL) is a machine learning method where an agent learns a strategy by interacting with the environment so that the maximum cumulative report is obtained over a long period of time. SAC (Soft activator-Critic) is a mainstream algorithm in RL that maximizes not only the cumulative rewards, but also the entropy of the strategies (Entropy) at the same time, encouraging the agent to keep certain randomness and diversity of actions for better exploration of the environment. Because the SAC algorithm is an offline Actor-Critic architecture, the training is stable, the sampling efficiency is high, and the method is particularly suitable for controlling the high-dimensional continuous motion space by an unmanned aerial vehicle. Disclosure of Invention In view of the above, the present invention aims to provide a single rotor fault tolerance control method of a quad-rotor unmanned helicopter based on a SAC method, which is used for overcoming the problems of complexity