CN-121995929-A - Unmanned aerial vehicle intelligent task planning Agent system and method based on large language model

CN121995929ACN 121995929 ACN121995929 ACN 121995929ACN-121995929-A

Abstract

The invention belongs to the technical field of unmanned aerial vehicles, and discloses an unmanned aerial vehicle intelligent task planning Agent system and method based on a large language model, wherein the system comprises a user interaction module, a large language model reasoning module, a tool calling module, a safety management module, a state sensing module and a memory module; the large language model reasoning module adopts a structured output analyzer to analyze model output into structured data, the tool calling module comprises tool functions of take-off, landing, waypoint flight, state acquisition and the like, and the safety management module defines constraint parameters such as minimum safety flight height, maximum safety flight height, single maximum moving distance and the like. The method realizes intelligent task planning by receiving a user natural language instruction, constructing a prompt template, calling a large language model reasoning, safety checking, executing a tool function and feeding back observation information in a closed loop process. The system and the method improve the stability and the reliability of the system and the success rate of complex task processing.

Inventors

HE KUNPENG
MENG JUN

Assignees

余姚市机器人研究中心
浙江大学

Dates

Publication Date: 20260508
Application Date: 20251219

Claims (10)

1. The unmanned aerial vehicle intelligent task planning Agent system based on the large language model is characterized by comprising the following modules: (1) The user interaction module is used for receiving a natural language task instruction of a user and feeding back a task execution result to the user; (2) The large language model reasoning module analyzes natural language instructions of the user, performs thinking chain reasoning and generates a structured action decision; (3) The tool calling module is used for executing specific unmanned aerial vehicle control operation; (4) The security management module is used for carrying out parameter verification and security constraint check on the action instruction generated by the large language model; (5) The state sensing module is used for acquiring the position, the gesture and the flight state information of the unmanned aerial vehicle in real time; (6) And the memory module is used for storing the latest K rounds of history conversations and task execution records and supporting context understanding and task continuity.
2. The unmanned aerial vehicle intelligent task planning Agent system based on the large language model according to claim 1, wherein the user interaction module has a log information release function, and the thinking process, the execution action and the observation result information of the Agent are released in a structured JSON format, so that a user can monitor and debug the system.
3. The unmanned aerial vehicle intelligent mission planning Agent system of claim 2, wherein the log information comprises the following fields of role identification, message type, and specific content.
4. The unmanned aerial vehicle intelligent mission planning Agent system based on a large language model of claim 1, wherein the large language model reasoning module adopts a structured outputter to analyze the output of the large language model into structured data comprising the following fields: (1) A thinking process field thoughts, which records and analyzes the current state and the reasoning process of the task progress; (2) An action name field action_name, which designates the name of a tool function to be called; (3) An action parameter field action_params, which is used for designating input parameters of the tool function in a key value pair mode; (4) And finally, a final message field final_message, and returning an execution result to the user when the task is completed.
5. The intelligent task planning Agent system of the unmanned aerial vehicle based on the large language model according to claim 1, wherein the tool calling module comprises the following tool functions: (1) The take-off tool function take off is used for controlling the unmanned aerial vehicle to take off to a designated height, and the parameter is the target height; (2) A landing tool function land is used for controlling the unmanned aerial vehicle to safely land; (3) The flight point flight tool function fly_to_waypoint controls the unmanned aerial vehicle to fly to a specified three-dimensional coordinate position, and parameters are target coordinates x, y and z; (4) And acquiring a state tool function get_status for acquiring the current position, posture and flight state information of the unmanned aerial vehicle.
6. The intelligent task planning Agent system for the unmanned aerial vehicle based on the large language model according to claim 1, wherein the state sensing module acquires the position information, the gesture information and the flight state of the unmanned aerial vehicle through ROS topic subscription or direct API call.
7. The intelligent task planning Agent system of the unmanned aerial vehicle based on the large language model according to claim 1, wherein the memory module stores the latest 5 rounds of dialogue histories by adopting a dialogue buffer window mechanism, and the data structure comprises a user message queue, an assistant message queue and a system observation queue.
8. The unmanned aerial vehicle intelligent task planning Agent method based on the large language model is characterized by comprising the following steps of: Step 1, receiving a user task instruction, adding the instruction into a dialogue history, constructing a prompt template, and assembling the current unmanned aerial vehicle state, the available tool description, the output format requirement and the dialogue history into a complete prompt; step 2, reasoning by adopting a large language model, analyzing the model output by a structural output analyzer to obtain an analysis result, comparing the analysis result with a preset error mark, judging that the analysis fails when the analysis result is equal to the preset error mark, adding a format error prompt and returning to the large language model for reasonement; Step 3, judging the action type, obtaining an action name, comparing the action name with a preset terminator 'Final', outputting a task completion signal when the action name is equal to 'Final', judging that the task is completed, and returning a Final message to a user; Step 4, searching the corresponding tool function to obtain a searching result, comparing the searching result with a preset null value mark, and when the searching result is null value, judging that the corresponding tool function is not found, generating error observation information; And 5, performing safety verification on the action parameters through the tool function to obtain action parameter values, comparing the action parameter values with a preset passing threshold, judging that verification is not passed when the action parameter values are lower than the passing threshold, generating safety interception observation information, judging that verification is passed when the action parameter values are not lower than the passing threshold, executing the tool function, acquiring an observation result, adding the observation result into a dialogue history, issuing an observation log, and returning to the step 2 to continue cycle reasoning until the task is completed or the maximum iteration number is reached.
9. The method for intelligent task planning Agent of unmanned aerial vehicle based on large language model according to claim 8, wherein in step 1, the prompt template comprises: (1) Defining an Agent as an intelligent unmanned aerial vehicle controller, and definitely carrying out logic reasoning according to a user instruction and the unmanned aerial vehicle state and calling a tool to complete a task; (2) Current state description including three-dimensional coordinates of a drone Yaw angle and flight status; (3) Tool list description, namely, specifying the function, parameter format and use scene of each available tool; (4) Executing rule description, including space reasoning rule, self-correcting rule and task ending rule; (5) Outputting a format instruction, namely outputting a required model according to a JSON format by using a format description generated by a structured output parser; the unmanned aerial vehicle state is obtained by automatically switching a positioning mode according to GPS signal intensity, GPS positioning is adopted, seamless switching is carried out to visual positioning when signals are weak, positioning data are synchronized to a large language model, and the accurate positioning process of the visual positioning is realized as follows: (1) The binocular vision depth is calculated, and the target relative depth is calculated through the left-right view difference, wherein the calculation formula is as follows: Wherein, the Representing the depth of the target with respect to the drone, The focal length of the binocular camera is shown, Representing the base line distance of the binocular camera, The abscissa of the left-hand view, An abscissa representing a right view; (2) And calculating the absolute coordinates of ORB feature points, and calculating the absolute position of the unmanned aerial vehicle through the world coordinates and the relative displacement of the feature points, wherein the calculation expression is as follows: Wherein, the Represents the absolute three-dimensional coordinates of the unmanned aerial vehicle, Representing environment-matching ORB feature point world coordinates, Representing the three-dimensional displacement vector of the characteristic points relative to the unmanned plane; (3) And (3) positioning accuracy compensation, namely correcting errors through camera distortion coefficients, wherein the calculation expression is as follows: Wherein, the The final corrected coordinates are indicated and the result is displayed, Representing the initial calculated coordinates of the object, And Representing the distortion coefficient of the camera, Representing the radial distance of the feature point image.
10. The method for intelligent mission planning Agent of an unmanned aerial vehicle based on a large language model according to claim 8, wherein in step 5, the security check comprises the following constraint parameters: (1) Setting the minimum flying height to be 1.0 meter; (2) Setting the maximum flying height to be 15.0 meters; (3) Setting the maximum single movement distance to be 20.0 meters; The specific safety verification process comprises the following steps: (1) For a take-off instruction, extracting a target height parameter to obtain a target height value, comparing the target height value with a preset safety height maximum value and a preset safety height minimum value, judging that safety check does not pass when the target height value is lower than the safety height minimum value or is higher than the safety height maximum value, and returning safety interception information; (2) For the waypoint flight instruction, the target height is judged to belong to the range of the safety height constraint value, and then the horizontal distance between the current position and the target position is calculated, wherein the calculation expression is as follows: Wherein, the And Representing the direction of the object And The axis of the rotation is set to be at the same position, And Indicating the current And An axis coordinate; Comparing the horizontal distance with a preset maximum single movement distance value, when the horizontal distance is larger than the preset maximum single movement distance value, judging that the safety check does not pass, returning interception information, suggesting step-by-step execution of a user; A task interrupt mechanism is adopted in the cyclic reasoning process, a new instruction is input in the task execution process, an interrupt mark is set by the system, the new instruction is acquired, the dialogue history is added in a format of 'user interrupt and new instruction input', and the task execution strategy is adjusted according to the change of the user intention when the context is understood in the next round of reasoning; The maximum number of iterations is set to 15.

Description

Unmanned aerial vehicle intelligent task planning Agent system and method based on large language model Technical Field The application belongs to the technical field of unmanned aerial vehicles, and particularly relates to an unmanned aerial vehicle intelligent task planning Agent system and method based on a large language model. Background In recent years, unmanned aerial vehicles have been rapidly developed in the military, industrial and civil fields. Unmanned aerial vehicle mission planning is used as a core branch of unmanned aerial vehicle technology, and is widely applied in aspects of inspection, logistics, rescue, agricultural monitoring and the like. Unmanned aerial vehicle mission planning involves techniques of various aspects including environmental awareness, path searching, trajectory optimization, motion control, human-machine interaction, and the like. Different technical routes need to be selected according to different task scenes and different application requirements. The traditional unmanned aerial vehicle task planning system mainly depends on preprogrammed rules and algorithms, has higher professional requirements on operators, and has insufficient system flexibility. The unmanned aerial vehicle control mode in the prior art mainly comprises remote controller operation, ground station software configuration, preset route execution and the like, and the following technical problems exist in the modes: (1) The operation complexity is high, and the user needs to be familiar with a professional control interface and parameter configuration. (2) The task adaptability is poor, and dynamic adjustment is difficult to carry out according to real-time environment changes. (3) The man-machine interaction mode is single, and task description in a natural language form cannot be understood. (4) Lacking autonomous reasoning capability, manual one-by-one configuration is required for complex multi-step tasks. Therefore, a new method is needed to combine a large language model with the unmanned aerial vehicle system to improve the intelligent level of the unmanned aerial vehicle, so that the unmanned aerial vehicle has the capabilities of understanding natural language instructions, independently reasoning and deciding and dynamically adjusting tasks. Disclosure of Invention In order to solve the technical problems in the prior art, the invention aims to combine a large language model with an unmanned aerial vehicle system, improve the intelligent level of the unmanned aerial vehicle, and enable the unmanned aerial vehicle to have the capabilities of understanding natural language instructions, independently reasoning and deciding and dynamically adjusting tasks, and has the following technical scheme: an unmanned aerial vehicle intelligent task planning Agent system based on a large language model comprises the following modules: (1) The user interaction module is used for receiving a natural language task instruction of a user and feeding back a task execution result to the user; (2) The large language model reasoning module analyzes natural language instructions of the user, performs thinking chain reasoning and generates a structured action decision; (3) The tool calling module is used for executing specific unmanned aerial vehicle control operation; (4) The security management module is used for carrying out parameter verification and security constraint check on the action instruction generated by the large language model; (5) The state sensing module is used for acquiring the position, the gesture and the flight state information of the unmanned aerial vehicle in real time; (6) And the memory module is used for storing the latest K rounds of history conversations and task execution records and supporting context understanding and task continuity. Furthermore, the user interaction module has a log information release function, and releases the thinking process, execution actions and observation result information of the Agent in a structured JSON format, so that a user can monitor and debug the information. Further, the log information includes fields of a role identification, a message type, and specific contents. Further, the large language model reasoning module adopts a structured outputter to analyze the output of the large language model into structured data comprising the following fields: (1) A thinking process field thoughts, which records and analyzes the current state and the reasoning process of the task progress; (2) An action name field action_name, which designates the name of a tool function to be called; (3) An action parameter field action_params, which is used for designating input parameters of the tool function in a key value pair mode; (4) And finally, a final message field final_message, and returning an execution result to the user when the task is completed. Further, the tool call module includes the following tool functions: (1) The take-off tool function take off i