CN-121978968-A - Unmanned aerial vehicle controller model building method and device based on constant-variation network and geometric symmetry

CN121978968ACN 121978968 ACN121978968 ACN 121978968ACN-121978968-A

Abstract

The application provides an unmanned aerial vehicle controller model building method and device based on a constant-change network and geometric symmetry. The method comprises the steps of determining an observation vector of an unmanned aerial vehicle based on a real-time state and a current special effect instruction of the unmanned aerial vehicle, inputting the observation vector into a characteristic linear modulation layer, generating a modulation parameter corresponding to the real-time state according to the current special effect instruction, modulating the real-time state, inputting the modulated observation vector into an irreducible representation conversion layer, generating an irreducible representation of an SO (2) group corresponding to the real-time state, inputting the irreducible representation into an equal-variation multi-layer perceptron, predicting and generating a control instruction corresponding to the current special effect instruction, and obtaining the output of a model of a controller of the unmanned aerial vehicle. Based on reinforcement learning of a network, the unmanned aerial vehicle controller model with high decision-making efficiency and strong generalization capability is obtained, and the unmanned aerial vehicle control cost is greatly reduced.

Inventors

GUO ZHANYU
WANG ZHIKUN
YIN ZIKANG
ZHENG CANLUN
GUO SHILIANG
XU JINMING
ZHAO SHIYU

Assignees

浙江大学
西湖大学

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. An unmanned aerial vehicle controller model building method based on a constant-change network and geometric symmetry is characterized by comprising the following steps: determining an observation vector of the unmanned aerial vehicle based on the real-time state and the current special effect instruction of the unmanned aerial vehicle; Inputting the observation vector into a characteristic linear modulation layer, generating a modulation parameter corresponding to the real-time state according to the current special effect instruction, and modulating the real-time state; inputting the modulated observation vector into an irreducible representation conversion layer to generate an irreducible representation of an SO (2) group corresponding to the real-time state; inputting the irreducible representation to a constant-variation multi-layer perceptron, predicting and generating a control instruction corresponding to the current special effect instruction, and obtaining the output of the unmanned aerial vehicle controller model.
2. The method according to claim 1, wherein the determining the observation vector for the drone based on the real-time status of the drone and the current stunt instruction comprises: Generating random noise according to the special action type of the unmanned aerial vehicle; Superposing the real-time state and the random noise to obtain a state vector; Acquiring a history control instruction corresponding to a history special effect of the unmanned aerial vehicle at a moment before the current special effect instruction; And splicing the state vector, the history control instruction and the current special effect instruction to generate the observation vector.
3. The method of claim 1, further comprising an execution network comprising, in order, the characterization linear modulation layer, an irreducible representation conversion layer, and the alike multi-layer perceptron, and an evaluation network comprising a standard characterization linear modulation layer, a standard irreducible representation conversion layer, and a standard alike multi-layer perceptron, a value header for affecting a prediction strategy of the execution network during a training phase, the standard characterization linear modulation layer receiving the same inputs as the characterization linear modulation layer, a target value header in the value header being selected based on the entered trick instructions, the value header for evaluating a real-time status of the drone.
4. The method according to claim 3, wherein prior to determining the observation vector for the drone based on the real-time status of the drone and the current stunt instruction, the method further comprises: Acquiring a sample observation vector, and inputting values into the execution network and the evaluation network; The execution network and the evaluation network change the sample observation vector based on a weight matrix meeting an isomorphism constraint respectively, modulate a sample state in the sample observation vector according to a sample task instruction in the sample observation vector, and generate a sample control instruction corresponding to the sample task instruction based on a modulated variable; Each value head of the evaluation network calculates a value rewarding value corresponding to a rewarding function, gates a target value head in the value heads based on the sample task instruction, and outputs the value rewarding value corresponding to the target value head; training the executive network based on the value rewards value and a near-end policy optimization algorithm.
5. A method according to claim 3, wherein the number of value heads is a plurality, the method comprising: determining a special effect flight task of the unmanned aerial vehicle; based on the number of motion categories for the trick flight mission; Designing the value heads with the same number according to the action type number; For each value head, determining a representation mode of a target parameter in each action type according to the action type corresponding to the value head, wherein the representation format of each action type is the same, and the representation contents are different; and calculating the product of a plurality of sub-bonus functions according to the target parameters as a bonus function of a value head.
6. The method according to claim 1, wherein prior to determining the observation vector for the drone based on the real-time status of the drone and the current stunt instruction, the method further comprises: determining a reward function according to the task type and the physical quantity, wherein the reward function is obtained by multiplying a plurality of sub-reward functions; And calculating a reward value according to the reward function, and training by using the reward value.
7. The method according to claim 1, wherein said inputting the observation vector into a characterizing linear modulation layer generates modulation parameters corresponding to the real-time state from the current trick instruction, modulating the real-time state, comprising: the multi-layer perceptron in the characteristic linear modulation layer generates a first modulation parameter and a second modulation parameter according to the input current special effect instruction; and calculating a modulation value corresponding to the real-time state based on the sum of the product of the first modulation parameter and the real-time state and the second modulation parameter.
8. The method of claim 1, wherein inputting the modulated observation vector into an irreducible representation conversion layer generates an irreducible representation of the group of SO (2) corresponding to the real-time state, comprising: calculating integer frequency corresponding to the real-time state as an index of the irreducible representation; Calculating the product of the real-time state and the complex number, and calculating the complex number representation corresponding to the real-time state; determining a conversion mode according to the magnitude value of the integer frequency, and converting the complex representation into a real representation according to the conversion mode; decomposing the real representation into a sum of values of a plurality of real irreducible representations, generating an irreducible representation of a symmetric SO (2) group.
9. The method according to claim 1, wherein the constant multi-layer perceptron includes at least a combination of a plurality of alternating constant linear layers and constant nonlinear activation function layers, and wherein inputting the irreducible representation to the constant multi-layer perceptron predicts a control command corresponding to the current trick command, comprising: Transforming the irreducible representation based on a weight matrix of the invariant linear layer; The constant nonlinear activation function layer decomposes the converted irreducible representation into Fourier series components at a plurality of frequencies, the amplitude of the Fourier series component at each frequency is the amplitude component at the corresponding frequency, and the phase is the same as the irreducible representation before decomposition; and generating a control instruction based on each decomposed Fourier series component map.
10. An unmanned aerial vehicle controller model building device based on constant-change network and geometric symmetry, which is characterized by comprising: the construction module is used for determining an observation vector of the unmanned aerial vehicle based on the real-time state and the current special effect instruction of the unmanned aerial vehicle; the characterization linear modulation layer is used for inputting the observation vector into the characterization linear modulation layer, generating a modulation parameter corresponding to the real-time state according to the current special effect instruction, and modulating the real-time state; An irreducible representation conversion layer, configured to input the modulated observation vector into the irreducible representation conversion layer, and generate an irreducible representation of the SO (2) group corresponding to the real-time state; And the constant-change multi-layer perceptron is used for inputting the irreducible representation into the constant-change multi-layer perceptron, predicting and generating a control instruction corresponding to the current special effect instruction, and obtaining the output of the unmanned aerial vehicle controller model.

Description

Unmanned aerial vehicle controller model building method and device based on constant-variation network and geometric symmetry Technical Field The application relates to the field of unmanned aerial vehicle control, in particular to an unmanned aerial vehicle controller model building method and device based on a constant-change network and geometric symmetry. Background Currently, with rapid development of unmanned aerial vehicle technology, flight tasks gradually move from simple navigation to high-dynamic and high-mobility special-effect flight, and unmanned aerial vehicle flight controllers generally perform complex control on unmanned aerial vehicles by adopting a deep reinforcement learning-based mode. However, existing methods based on deep reinforcement learning still face significant challenges in complex flight missions. On the one hand, the current reinforcement learning method is usually data intensive, a large number of training samples are required to converge, the requirements on the number of samples and quality are high, the training time is long, and the training efficiency is low. On the other hand, existing flight strategies tend to be highly characterized, i.e., requiring separate training of an individual agent for each trick action (e.g., flipping, rolling, rotating, etc.), a "one-to-one" model is not only computationally expensive, but also fails to generate a unified flight strategy with general understanding capabilities. Part of the prior art of deep learning utilizes the commonality between tasks to learn a generic model through multitasking reinforcement learning (MTRL), but has limited progress in the field of unmanned aerial vehicle trick flight, in part because when applied to unmanned aerial vehicle trick flight, conventional trick flight generation methods are required to rely on preset waypoints (waypoints), which in turn limits the unmanned aerial vehicle from making truly highly maneuverable extreme movements. Therefore, there is a need for an efficient and versatile unmanned aerial vehicle flight controller for efficient and accurate control of various unified trick scenes. Disclosure of Invention In view of the above, the application provides a method and a device for establishing an unmanned aerial vehicle controller model based on a constant-change network and geometric symmetry, which are used for solving the problems of large data demand and policy singleization of the traditional unmanned aerial vehicle controller. Specifically, the application is realized by the following technical scheme: the first aspect of the application provides a method for establishing an unmanned aerial vehicle controller model based on a constant-change network and geometric symmetry, which comprises the following steps: determining an observation vector of the unmanned aerial vehicle based on the real-time state and the current special effect instruction of the unmanned aerial vehicle; Inputting the observation vector into a characteristic linear modulation layer, generating a modulation parameter corresponding to the real-time state according to the current special effect instruction, and modulating the real-time state; inputting the modulated observation vector into an irreducible representation conversion layer to generate an irreducible representation of an SO (2) group corresponding to the real-time state; inputting the irreducible representation to a constant-variation multi-layer perceptron, predicting and generating a control instruction corresponding to the current special effect instruction, and obtaining the output of the unmanned aerial vehicle controller model. The second aspect of the present application provides an unmanned aerial vehicle controller model building device based on a constant network and geometric symmetry, the device comprising: the construction module is used for determining an observation vector of the unmanned aerial vehicle based on the real-time state and the current special effect instruction of the unmanned aerial vehicle; the characterization linear modulation layer is used for inputting the observation vector into the characterization linear modulation layer, generating a modulation parameter corresponding to the real-time state according to the current special effect instruction, and modulating the real-time state; An irreducible representation conversion layer, configured to input the modulated observation vector into the irreducible representation conversion layer, and generate an irreducible representation of the SO (2) group corresponding to the real-time state; And the constant-change multi-layer perceptron is used for inputting the irreducible representation into the constant-change multi-layer perceptron, predicting and generating a control instruction corresponding to the current special effect instruction, and obtaining the output of the unmanned aerial vehicle controller model. Compared with the traditional processing mode of the unmann