CN-121979233-A - Unmanned aerial vehicle flight control method and device, electronic equipment and storage medium

CN121979233ACN 121979233 ACN121979233 ACN 121979233ACN-121979233-A

Abstract

The embodiment of the application discloses a flight control method and device of an unmanned aerial vehicle, electronic equipment and a storage medium, and belongs to the technical field of visual language navigation processing. The method comprises the steps of inputting a visual image and a corresponding instruction text at each moment into a visual language sub-model of a pre-trained flight prediction model, outputting navigation intention information corresponding to the unmanned aerial vehicle at each moment, obtaining current state information and current action information of the unmanned aerial vehicle at the current moment, taking the navigation intention information at the current moment or the navigation intention information at the moment closest to the current moment as target navigation intention information of the unmanned aerial vehicle, inputting the current state information, the current action information and the target navigation intention information into an action prediction sub-model of the pre-trained flight prediction model, and outputting predicted action information for guiding the unmanned aerial vehicle to adjust. The application can improve the flight continuity of the unmanned aerial vehicle, thereby improving the flight stability of the unmanned aerial vehicle.

Inventors

ZHU XUEKE
MENG QINGYAN
ZHOU HUIHUI
YU LIUTAO
ZHANG WEI
LIN YANYU
LIN WENJIE
CHENG WENXIANG

Assignees

鹏城实验室

Dates

Publication Date: 20260505
Application Date: 20251231

Claims (10)

1. A method of controlling the flight of an unmanned aerial vehicle, comprising: acquiring visual images acquired by an unmanned aerial vehicle at a plurality of moments, and generating instruction texts corresponding to the visual images at each moment; inputting the visual image and the corresponding instruction text at each time point into a visual language sub-model of a pre-trained flight prediction model, and outputting navigation intention information corresponding to the unmanned aerial vehicle at each time point; acquiring current state information and current action information of the unmanned aerial vehicle at the current moment, and taking navigation intention information at the current moment or navigation intention information at the moment closest to the current moment as target navigation intention information of the unmanned aerial vehicle; inputting the current state information, the current action information and the target navigation intention information into an action prediction sub-model of the pre-trained flight prediction model, and outputting predicted action information corresponding to the unmanned aerial vehicle, wherein the predicted action information is used for guiding the unmanned aerial vehicle to adjust subsequent flight actions; The method comprises the steps of processing input data in parallel by a visual language sub-model of a pre-trained flight prediction model and an action prediction sub-model of the pre-trained flight prediction model, wherein the pre-trained flight prediction model is obtained by training based on differences between predicted sample prediction action information and label action information, and the sample prediction action information is obtained by inputting sample state information, sample action information, sample visual images and sample instruction texts corresponding to the sample visual images corresponding to the unmanned aerial vehicle at a plurality of sample moments into the flight prediction model for prediction.
2. The method according to claim 1, wherein the step of using the navigation intention information at the current time or the navigation intention information at a time closest to the current time as the target navigation intention information of the unmanned aerial vehicle comprises: If the navigation intention information of the current moment is included in the navigation intention information output by the vision language submodel of the pre-trained flight prediction model, determining a first image definition value of a vision image corresponding to the current moment, and determining a second image definition value of a vision image corresponding to the moment nearest to the current moment; when the first image definition value is equal to or larger than the second image definition value, taking the navigation intention information at the current moment as target navigation intention information of the unmanned aerial vehicle; and when the first image definition value is smaller than the second image definition value or the navigation intention information at the moment closest to the current moment is not included in the navigation intention information outputted by the visual language submodel of the pre-trained flight prediction model, the navigation intention information at the moment closest to the current moment is used as target navigation intention information of the unmanned plane.
3. The method according to claim 1, further comprising, after the outputting the predicted motion information corresponding to the unmanned aerial vehicle: Acquiring an instruction action comparison table; Analyzing the predicted action information to obtain a flight instruction, determining a corresponding flight action from the instruction action comparison table according to the flight instruction, and controlling the unmanned aerial vehicle to fly according to the flight action.
4. The method of claim 1, further comprising, prior to said inputting the visual image and the corresponding instruction text at each time instance into a visual language sub-model of a pre-trained flight prediction model: Acquiring sample visual images acquired by an unmanned aerial vehicle at a plurality of sample moments, and generating sample instruction texts corresponding to the sample visual images at each sample moment; Inputting the sample visual image and the corresponding sample instruction text at each sample time into a visual language sub-model of a flight prediction model, and outputting sample navigation intention information corresponding to the unmanned aerial vehicle at each sample time; Acquiring sample state information and sample action information of the unmanned aerial vehicle at a target sample moment, and taking sample navigation intention information of the target sample moment or sample navigation intention information of a sample moment nearest to the target sample moment as sample target navigation intention information of the unmanned aerial vehicle; Inputting the sample state information, the sample motion information and the sample target navigation intention information into a motion prediction sub-model of the flight prediction model, and outputting sample prediction motion information corresponding to the unmanned aerial vehicle; Acquiring tag action information corresponding to the sample prediction action information, calculating action difference between the sample prediction action information and the tag action information, and updating the flight prediction model according to the action difference to obtain the updated flight prediction model; And when the action difference does not meet the preset action difference condition, returning to execute the step of acquiring the sample state information and the sample action information of the unmanned aerial vehicle at the target sample moment until the action difference meets the preset action difference condition, and obtaining a pre-trained flight prediction model.
5. The method according to claim 4, wherein updating the flight prediction model according to the motion difference to obtain the updated flight prediction model comprises: acquiring the label navigation intention information of the unmanned aerial vehicle; calculating a navigation intention difference between the sample target navigation intention information and the corresponding tag navigation intention information; When the navigation intention difference does not meet a preset navigation intention condition, updating a visual language sub-model of the flight prediction model according to the navigation intention difference, and updating an action prediction sub-model of the flight prediction model according to the action difference to obtain an updated flight prediction model; When the navigation intention difference meets the preset navigation intention condition, updating the action prediction sub-model of the flight prediction model according to the action difference to obtain the updated flight prediction model.
6. The method according to claim 4, wherein the obtaining the tag action information corresponding to the sample predicted action information, calculating an action difference between the sample predicted action information and the tag action information, updating the flight prediction model according to the action difference, and obtaining the updated flight prediction model, includes: acquiring a plurality of pieces of sample prediction action information and label action information corresponding to each piece of sample prediction action information; And calculating action differences between each sample prediction action information and the corresponding label action information, and updating the flight prediction model according to the action differences to obtain the updated flight prediction model.
7. The method according to claim 6, wherein calculating the motion difference between each of the sample predicted motion information and the corresponding tag motion information, updating the flight prediction model according to the motion difference, and obtaining the updated flight prediction model comprises: Acquiring expected termination time and a preset base number; calculating a time difference value between the expected termination time and the sample time corresponding to the sample prediction action information according to each piece of sample prediction action information, and taking the time difference value as an index of the preset base number at the corresponding sample time; determining a time weight corresponding to each sample prediction action information according to the preset base number and indexes corresponding to the preset base number at different sample moments; And calculating the action difference between each sample prediction action information and the corresponding label action information according to the time weight corresponding to each sample prediction action information, and updating the flight prediction model according to the action difference to obtain the updated flight prediction model.
8. A flight control device for an unmanned aerial vehicle, comprising: The first acquisition module is used for acquiring visual images acquired by the unmanned aerial vehicle at a plurality of moments and generating instruction texts corresponding to the visual images at each moment; the navigation intention information determining module is used for inputting the visual image and the corresponding instruction text at each moment into a visual language sub-model of a pre-trained flight prediction model and outputting navigation intention information corresponding to the unmanned aerial vehicle at each moment; The second acquisition module is used for acquiring current state information and current action information of the unmanned aerial vehicle at the current moment and taking navigation intention information at the current moment or navigation intention information at the moment closest to the current moment as target navigation intention information of the unmanned aerial vehicle; The target prediction module is used for inputting the current state information, the current action information and the target navigation intention information into an action prediction sub-model of the pre-trained flight prediction model, outputting predicted action information corresponding to the unmanned aerial vehicle, wherein the predicted action information is used for guiding the unmanned aerial vehicle to adjust subsequent flight actions, the visual language sub-model of the pre-trained flight prediction model and the action prediction sub-model of the pre-trained flight prediction model are used for processing input data in parallel, the pre-trained flight prediction model is obtained by training based on differences between predicted sample prediction action information and label action information, and the sample prediction action information is obtained by inputting sample state information corresponding to the unmanned aerial vehicle at a plurality of sample moments, sample action information, sample visual images and sample instruction texts corresponding to the sample visual images into the flight prediction model.
9. An electronic device comprising a memory storing a computer program and a processor implementing the method of flight control of the drone of any one of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of flight control of a drone according to any one of claims 1 to 7.

Description

Unmanned aerial vehicle flight control method and device, electronic equipment and storage medium Technical Field The application relates to the technical field of visual language navigation processing, in particular to a flight control method and device of an unmanned aerial vehicle, electronic equipment and a storage medium. Background At present, unmanned aerial vehicle visual language navigation (Vision-and-Language Navigation for Unmanned AERIAL VEHICLES, VLN) has become a popular research field, and unmanned aerial vehicles require a model carried by the unmanned aerial vehicle to predict flight actions in real time so as to ensure smooth flight of the unmanned aerial vehicle. In the related art, an unmanned aerial vehicle predicts the flight action of the unmanned aerial vehicle at the next moment by carrying out data processing on visual information, control instruction text and flight information acquired in real time in the flight process. However, because the visual information of the unmanned aerial vehicle needs to be subjected to complex image processing, real-time reasoning is difficult to realize in some complex scenes, so that the delay of the flight action of the unmanned aerial vehicle is caused, the unmanned aerial vehicle is required to stop and wait before predicting the next flight action, and the risk of unstable crash is increased due to flight jam of the unmanned aerial vehicle. That is, the unmanned aerial vehicle in the related art has poor flight continuity, resulting in a decrease in flight stability of the unmanned aerial vehicle. Disclosure of Invention The embodiment of the application provides a flight control method, a flight control device, electronic equipment and a storage medium of an unmanned aerial vehicle, which can improve the flight continuity of the unmanned aerial vehicle, thereby improving the flight stability of the unmanned aerial vehicle. In order to achieve the above object, an aspect of an embodiment of the present application provides a flight control method of an unmanned aerial vehicle, including: acquiring visual images acquired by the unmanned aerial vehicle at a plurality of moments, and generating instruction texts corresponding to the visual images at each moment; Inputting the visual image and the corresponding instruction text at each moment into a visual language sub-model of a pre-trained flight prediction model, and outputting navigation intention information corresponding to the unmanned aerial vehicle at each moment; Acquiring current state information and current action information of the unmanned aerial vehicle at the current moment, and taking navigation intention information at the current moment or navigation intention information at the moment nearest to the current moment as target navigation intention information of the unmanned aerial vehicle; Inputting the current state information, the current action information and the target navigation intention information into an action prediction sub-model of a pre-trained flight prediction model, and outputting predicted action information corresponding to the unmanned aerial vehicle, wherein the predicted action information is used for guiding the unmanned aerial vehicle to adjust subsequent flight actions; the method comprises the steps of processing input data in parallel by a visual language sub-model of a pre-trained flight prediction model and an action prediction sub-model of the pre-trained flight prediction model, wherein the pre-trained flight prediction model is obtained by training based on differences between predicted sample prediction action information and label action information, and the sample prediction action information is obtained by inputting sample state information, sample action information, sample visual images and sample instruction texts corresponding to the sample visual images corresponding to the unmanned aerial vehicle under a plurality of sample moments into the flight prediction model for prediction. In some embodiments, the navigation intention information at the current time or the navigation intention information at the time closest to the current time is used as the target navigation intention information of the unmanned aerial vehicle, and the method includes: If the navigation intention information of the visual language sub-model of the pre-trained flight prediction model comprises navigation intention information of the current moment, determining a first image definition value of a visual image corresponding to the current moment, and determining a second image definition value of a visual image corresponding to the moment nearest to the current moment; when the first image definition value is equal to or larger than the second image definition value, taking the navigation intention information at the current moment as target navigation intention information of the unmanned aerial vehicle; And when the first image definition value is smaller than