CN-121978967-A - Unmanned aerial vehicle controller model construction method and device based on target and instruction guidance

CN121978967ACN 121978967 ACN121978967 ACN 121978967ACN-121978967-A

Abstract

The application provides an unmanned aerial vehicle controller model construction method and device based on target and instruction guidance, which are used for acquiring target action content of an unmanned aerial vehicle and real-time state information of the unmanned aerial vehicle, generating a first state identifier according to the target action content, generating a second state identifier according to the real-time state information of the unmanned aerial vehicle, summarizing at least the first state identifier and the second state identifier to form state data, constructing the unmanned aerial vehicle controller model by taking the state data as input of the unmanned aerial vehicle controller model, establishing a dynamics model of the unmanned aerial vehicle, and training the unmanned aerial vehicle controller model based on the dynamics model and reinforcement learning to obtain the unmanned aerial vehicle controller model. Through classification and standardization processing of various scene state data, an unmanned aerial vehicle controller model adapting to various control scenes is constructed, and accuracy of unmanned aerial vehicle power control is improved.

Inventors

YIN ZIKANG
WANG ZHIKUN
ZHENG CANLUN
GUO SHILIANG
XU JINMING
ZHAO SHIYU

Assignees

浙江大学
西湖大学

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. A method for constructing a model of an unmanned aerial vehicle controller based on target and instruction guidance, the method comprising: Acquiring target action content of an unmanned aerial vehicle and real-time state information of the unmanned aerial vehicle, and generating a first state identifier according to the target action content, wherein the first state identifier at least comprises a task type mark and a first state parameter sequence of a first association degree of a task type; generating a second state identifier according to the real-time state information of the unmanned aerial vehicle, wherein the second state identifier at least comprises a second state parameter sequence with a second association degree with the task type, and the first association degree is larger than the second association degree; summarizing at least the first state identifier and the second state identifier to form state data; Taking the state data as input of the unmanned aerial vehicle controller model to construct the unmanned aerial vehicle controller model; And establishing a dynamics model of the unmanned aerial vehicle, and training the unmanned aerial vehicle controller model based on the dynamics model and reinforcement learning to obtain the unmanned aerial vehicle controller model.
2. The method of claim 1, wherein establishing a kinetic model of the unmanned aerial vehicle comprises: establishing a battery dynamics equation of the unmanned aerial vehicle, and calculating real-time battery voltage; calculating the real-time rotating speed of each motor according to the real-time battery voltage and the polynomial function, calculating the thrust and the moment generated by each motor according to the real-time rotating speed of each motor, and establishing a motor dynamics equation of the unmanned aerial vehicle; Based on total thrust and total moment generated by all motors, performing kinematic analysis on the unmanned aerial vehicle, calculating real-time attitude and motion information of the unmanned aerial vehicle, and establishing an aerodynamics equation of the unmanned aerial vehicle; And combining the aviation dynamics equation, the battery dynamics equation and the motor dynamics equation to obtain a dynamics model of the unmanned aerial vehicle.
3. The method of claim 1, wherein the generating a first status identifier from the target action content comprises: Identifying target action content of the unmanned aerial vehicle, determining a flower fly action type, and generating a task type number according to the flower fly action type; determining the marking parameters corresponding to the action type of the flower fly, determining the values of the marking parameters according to the target action content, and generating a marking parameter sequence; combining the task type number and the mark parameter sequence to generate the task type mark; Generating a target state sequence according to the target action content; Acquiring associated real-time state information corresponding to the target state sequence in the real-time state information, calculating the relative pose between the associated real-time state information and the target state sequence, and generating a first state parameter sequence; and splicing the task type mark and the first state parameter sequence to generate a first state identifier.
4. A method according to claim 3, wherein the number of the flying action types is plural, each flying action type corresponds to a unique task type number, each flying action type corresponds to a different flag parameter, the number of flag parameters corresponding to each flying action type is the same, and the status parameter types in the second status parameter sequence corresponding to the different flying action types are the same.
5. The method of claim 3, wherein obtaining associated real-time state information corresponding to the target state sequence in the real-time state information, calculating a relative pose between the associated real-time state information and the target state sequence, and generating a first state parameter sequence, comprises: establishing a space coordinate system in a flight area; Generating a target state sequence in the space coordinate system according to the target action content, wherein the target state sequence at least comprises a target position state and a target pose state; Screening real-time position states and real-time pose states from the real-time state information, and generating associated real-time state information in the space coordinate system; Calculating a difference value between a real-time position state and the target position state, generating a relative position sequence, and calculating a relative posture sequence between the target posture state and the real-time posture state; And combining the relative position sequence and the relative posture sequence to generate the first state parameter sequence.
6. The method of claim 1, wherein the generating a second status identifier from the real-time status information of the drone includes: determining a second state parameter sequence different from the first state parameter sequence from the real-time state information; Determining the type of the security parameter according to the target action content; Calculating the value of the safety parameter according to the real-time state information; and combining the second state parameter sequence and the safety parameter to generate the second state identifier.
7. The method of claim 1, wherein after summarizing at least the first and second constituent state groups, the method further comprises: Splicing the first state identifier and the second state identifier to form initial state data; Acquiring a historical action sequence of the unmanned aerial vehicle as a third state identifier; And splicing the initial state data with the third state identifier to generate the state data, wherein the third state identifier is used for limiting the control parameters predicted by the unmanned aerial vehicle controller model and the historical action sequence variation scale.
8. The method of claim 1, wherein establishing a dynamics model of the unmanned aerial vehicle, training the unmanned aerial vehicle controller model based on the dynamics model and reinforcement learning, obtaining the unmanned aerial vehicle controller model, comprises: generating an action description sample of a preset flower fly action, and generating a first state identification sample and a second state identification sample based on the action description sample, wherein the lengths of the first state identification sample and the second state identification sample of each flower fly action type are the same; inputting the first state identification sample and the second state identification sample into an unmanned aerial vehicle controller model to generate a control instruction sample; generating a virtual unmanned aerial vehicle based on the dynamic model, inputting the control instruction sample into the virtual unmanned aerial vehicle, and generating a real-time state information sample of the unmanned aerial vehicle after control; Calculating a reward function value based on the real-time status information sample and the action description sample; And adjusting a prediction strategy of the unmanned aerial vehicle controller model based on the reward function value.
9. The method of claim 8, wherein the unmanned controller model is a fully connected network, the calculating a reward function value based on the real-time status information samples and the action description samples comprising: Determining a bonus function template, the bonus function template being a product of a plurality of minimized amount bonus values; instantiating coefficients before each minimized quantity rewarding value in the rewarding function template according to the type of the preset flying action, and generating an instantiated rewarding function corresponding to the preset flying action; and calculating the value of the instantiation reward function according to the gap between the real-time state information sample and the action description sample, and obtaining a reward function value.
10. An unmanned aerial vehicle controller model construction device based on target and instruction direction, characterized in that the device comprises: the processing module is used for acquiring target action content of the unmanned aerial vehicle and real-time state information of the unmanned aerial vehicle, and generating a first state identifier according to the target action content, wherein the first state identifier at least comprises a task type mark and a first state parameter sequence of a first association degree of a task type; the processing module is further used for generating a second state identifier according to the real-time state information of the unmanned aerial vehicle, the second state identifier at least comprises a second state parameter sequence with a second association degree with the task type, and the first association degree is larger than the second association degree; The processing module is further used for summarizing at least the first state identifier and the second state identifier to form state data; The modeling module is used for taking the state data as input of the unmanned aerial vehicle controller model to construct the unmanned aerial vehicle controller model; And the training module is used for establishing a dynamics model of the unmanned aerial vehicle, training the unmanned aerial vehicle controller model based on the dynamics model and reinforcement learning, and obtaining the unmanned aerial vehicle controller model.

Description

Unmanned aerial vehicle controller model construction method and device based on target and instruction guidance Technical Field The application relates to the technical field of unmanned aerial vehicle aircraft modeling, in particular to an unmanned aerial vehicle controller model building method and device based on target and instruction guidance. Background In the technical field of intelligent flight control, along with the wide application of multi-rotor unmanned aerial vehicle in numerous scenes such as logistics distribution, film and television shooting, mapping and exploration, emergency rescue and the like, during holidays, the unmanned aerial vehicle is often used for making various flying actions to complete performance of specific contents. Along with the complexity of performance content, the requirements on the accuracy, stability and adaptability of unmanned aerial vehicle gesture and position control are increasingly stringent. Currently, common unmanned aerial vehicles are mostly based on traditional control algorithms or complex network model structures. The conventional control algorithm, such as PID control, is simple in structure and easy to implement, but is difficult to implement accurate dynamic adjustment in the face of complex and variable flight tasks and environmental interference. The simple state feedback mechanism only depends on the current position, speed and other finite state information of the unmanned aerial vehicle to generate a control instruction, and the deep understanding of a task target and the effective utilization of historical control information are lacked. The complex neural network model directly takes the content of the fly actions as input, and all intermediate processes are handed to the neural network model, namely a black box, so that the model is complex in structure, needs a large number of samples for long-time training, or needs to train a separate model for each type of actions. The existing unmanned aerial vehicle control model has the following problems that the model structure is too complex, a large number of samples are required to be trained for a long time, the complete intermediate process is completely unknown, the adjustability is lacking, meanwhile, the model specificity is too strong, the general performance is poor, and different action types cannot be processed due to different requirements. Disclosure of Invention In view of the above, the application provides a method and a device for constructing an unmanned aerial vehicle controller model based on target and instruction guidance, which are used for solving the requirement that diversified and complex flight tasks are difficult to meet. Specifically, the application is realized by the following technical scheme: The first aspect of the application provides a method for constructing an unmanned aerial vehicle controller model based on target and instruction guidance, which comprises the following steps: Acquiring target action content of an unmanned aerial vehicle and real-time state information of the unmanned aerial vehicle, and generating a first state identifier according to the target action content, wherein the first state identifier at least comprises a task type mark and a first state parameter sequence of a first association degree of a task type; generating a second state identifier according to the real-time state information of the unmanned aerial vehicle, wherein the second state identifier at least comprises a second state parameter sequence with a second association degree with the task type, and the first association degree is larger than the second association degree; summarizing at least the first state identifier and the second state identifier to form state data; Taking the state data as input of the unmanned aerial vehicle controller model to construct the unmanned aerial vehicle controller model; And establishing a dynamics model of the unmanned aerial vehicle, and training the unmanned aerial vehicle controller model based on the dynamics model and reinforcement learning to obtain the unmanned aerial vehicle controller model. A second aspect of the present application provides an unmanned aerial vehicle controller model building apparatus based on target and instruction guidance, which is characterized in that the apparatus includes: the processing module is used for acquiring target action content of the unmanned aerial vehicle and real-time state information of the unmanned aerial vehicle, and generating a first state identifier according to the target action content, wherein the first state identifier at least comprises a task type mark and a first state parameter sequence of a first association degree of a task type; the processing module is further used for generating a second state identifier according to the real-time state information of the unmanned aerial vehicle, the second state identifier at least comprises a second state parameter sequence with