CN-121995942-A - Unmanned aerial vehicle target tracking flight control method based on low-delay video stream and small target detection

CN121995942ACN 121995942 ACN121995942 ACN 121995942ACN-121995942-A

Abstract

The invention relates to an unmanned aerial vehicle target tracking flight control method based on low-delay video stream and small target detection, which comprises the steps of adopting a network input interface to combine a hardware decoder to decode the video stream in a low-delay mode, publishing generated image frames in a specified topic, inputting the image frames into a target detection model to execute multithread asynchronous reasoning, outputting normalized pixel coordinates of a tracking target, generating tracking control quantity of an unmanned aerial vehicle nacelle through a PID (proportion integration differentiation) controller in combination with error filtering and integral limiting processing according to pixel errors of the normalized pixel coordinates, sending the tracking control quantity to a nacelle control unit, generating an expected speed vector of the unmanned aerial vehicle according to camera view angle and unmanned aerial vehicle gesture information, and issuing a speed set value instruction to an unmanned aerial vehicle flight control system through a protocol command interface so as to drive the unmanned aerial vehicle to execute tracking flight. The invention realizes the high-efficiency, stable and low-delay automatic tracking flight control of the unmanned aerial vehicle on the moving or static target.

Inventors

LI DAWEI
CHEN ZHENGAO
YANG JIONG

Assignees

北京航空航天大学

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (10)

1. The unmanned aerial vehicle target tracking flight control method based on low-delay video stream and small target detection is characterized by comprising the following steps of: The method comprises the steps of performing low-delay decoding on a video stream acquired from an unmanned aerial vehicle camera by adopting a preset network input interface and combining a hardware decoder, and publishing generated image frames in a specified topic; inputting an image frame into a target detection model to execute multi-thread asynchronous reasoning, and outputting a normalized pixel coordinate of a tracking target, wherein the target detection model is an improved model of a high-resolution detection branch with a step length of 4 pixels added into a YOLOv network structure; Generating tracking control quantity of the unmanned aerial vehicle nacelle through combining error filtering and integral amplitude limiting processing by a PID controller according to pixel errors of the normalized pixel coordinates, and sending the tracking control quantity to a nacelle control unit through a protocol command interface; According to the camera field angle and the unmanned aerial vehicle attitude information, pixel deviation is transformed to a navigation coordinate system, an expected speed vector of the unmanned aerial vehicle is generated, and a speed set value instruction is issued to a unmanned aerial vehicle flight control system through a protocol command interface so as to drive the unmanned aerial vehicle to execute tracking flight.
2. The method of claim 1, wherein low-latency decoding a video stream acquired from a drone camera using a preset network input interface in combination with a hardware decoder and publishing the generated image frames in a specified topic comprises: Configuring and receiving a video stream from a unmanned aerial vehicle camera through a network input interface of FFmpeg, wherein the video stream is transmitted by adopting a TCP (transmission control protocol); Initializing a corresponding hardware decoder and creating a corresponding hardware device context according to the coding format of the video stream; Starting an independent decoding thread, decoding a video stream under the condition of configuring a low-delay buffer strategy, and outputting a hardware frame, wherein the low-delay buffer strategy comprises starting unbuffered reading, refreshing a packet buffer and setting a maximum delay buffer to be a minimum value; Converting the hardware frame into BGR format and performing scaling processing to generate an image frame; packaging the image frame into an ROS message, and issuing the ROS message to a specified ROS topic by a preset service quality strategy; Wherein the decoding thread is a loop that performs the process of reading a data packet from an input context, sending the data packet to a hardware decoder, and retrieving a decoded hardware frame from the decoder.
3. The method of claim 2, wherein converting the hardware frame into BGR format and performing a scaling process to generate the image frame comprises: Acquiring a hardware frame output by a hardware decoder; If the FFmpeg filter path is adopted to carry out BGR conversion and scaling on the hardware frame, the hardware frame is uploaded to a filter map, a hardware scaling filter is called to carry out size adjustment and pixel format conversion on the hardware frame, an image frame in a BGR format is generated, and the generated image frame is transmitted to a system memory from a video memory through a hardware memory downloading filter; If the software path is adopted to carry out BGR conversion and scaling on the hardware frame, the hardware frame of the video memory is converted into a system memory frame through the transmission interface, and then the image processing software library is called to carry out pixel format conversion and size scaling on the system memory frame, so that the image frame in the BGR format is generated.
4. The method of claim 2, further comprising, after encapsulating the image frame into a ROS message and issuing the ROS message to the specified ROS topic with a preset quality of service policy: Acquiring an image source to be coded and distributed, wherein the image source comprises an original image topic issued in an ROS system and/or a processed image topic overlapped with a tracking target mark frame; starting an independent coding thread, converting an image source with uniform resolution into a pixel format input by a matching hardware encoder, and scaling to a preset output resolution; Initializing a hardware encoder and a corresponding output stream context, and configuring low-delay encoding parameters, wherein the low-delay encoding parameters comprise a code rate, a frame rate, a time base, a key frame interval and the number of B frames set to 0; sending the converted and scaled image source into a hardware encoder for encoding to generate a video code stream; And pushing the coded video code stream to a designated network address in real time by using an RTSP (real time streaming protocol) through the configured output stream context.
5. The method of claim 1, wherein inputting the image frames into the object detection model performs multi-threaded asynchronous reasoning, and outputting normalized pixel coordinates of the tracked object comprises: Loading an improved YOLOv model on a processor platform through a model reasoning interface, and establishing a multi-thread Cheng Tuili thread pool with preset thread numbers; Submitting the image frames to an inference thread pool to execute asynchronous inference according to a preset frame extraction strategy, wherein the frame extraction strategy comprises the steps of submitting an inference task to each frame of image when the frame extraction parameter is 0, and submitting the inference task to only images with specified frame numbers at intervals when the frame extraction parameter is greater than 0; receiving a detection result of each frame returned by the reasoning thread Chi Yibu, wherein the detection result comprises one or more detection frames and corresponding confidence; screening out one detection frame with highest confidence from all detection frames of the current frame, and taking the detection frame as a current tracking target; And converting the pixel coordinates of the central point of the current tracking target into normalized coordinates, packaging the obtained normalized pixel coordinates into a message in a specified format, and publishing the message in a specified topic.
6. The method of claim 5, wherein before loading the modified YOLOv model on the processor platform via the model inference interface and building the multithreading Cheng Tuili thread pool for the preset number of threads, further comprising: adding a P2 scale detection branch to the multi-scale detection branch of the initial YOLOv model to obtain an improved YOLOv model, so that the model output can cover a P2 characteristic diagram with the step length of 4 pixels, a P3 characteristic diagram with the step length of 8 pixels and a P4 characteristic diagram with the step length of 16 pixels at the same time; Training the improved YOLOv model according to a preset training data set, and carrying out light weight processing on the trained improved YOLOv model according to the characteristic information of the processor platform to generate an optimized model file for loading a model reasoning interface; the P2 scale detection branch construction comprises the steps of executing up-sampling operation on high-level semantic features in a neck structure of YOLOv network, splicing the up-sampled features with shallow features output by a backbone network, sending the spliced features into a C2f structure for feature fusion, and accessing the fused features into a detection head of YOLOv network to execute target classification and bounding box regression on a high-resolution P2 feature map with a 4-pixel step size.
7. The method of claim 1, wherein generating tracking control quantities for the drone pod by combining error filtering and integral clipping processing by the PID controller based on pixel errors of the normalized pixel coordinates, and transmitting the tracking control quantities to the pod control unit via the protocol command interface comprises: judging whether the currently received normalized pixel coordinates are valid or not according to preset target validity flag information, and judging whether a pod tracking function is enabled or not; if the normalized pixel coordinates are invalid or the pod tracking function is not enabled, sending a motion stop or centering instruction to the pod control unit through the protocol command interface; if the normalized pixel coordinates are valid and the pod tracking function is enabled, acquiring pixel errors of the tracking target in the image coordinate system based on the normalized pixel coordinates, wherein the pixel errors comprise longitudinal pixel errors for a pitch control axis and transverse pixel errors for a yaw control axis; respectively inputting the longitudinal pixel errors and the transverse pixel errors into pitch axis and yaw axis PID controllers according to a set control period, and sequentially executing filtering processing and differential filtering processing on the input pixel errors by each controller to output pitch axis control quantity and yaw axis control quantity; And respectively restricting the pitch axis control quantity and the yaw axis control quantity to a preset pitch angle speed maximum value and a yaw angle speed maximum value, and sending the obtained nacelle pitch angle speed and yaw angle speed command to a nacelle control unit through a protocol command interface.
8. The method of claim 7, wherein communication and control of the drone pod is achieved between the pod control unit and the pod by bridging and specific control protocols for pod communication, comprising: initializing a nacelle communication link, synchronously establishing a datagram protocol socket connection facing network transmission and/or a serial port connection facing wired transmission, and starting an independent receiving thread and a sending thread for each established port; In a receiving thread, analyzing the received original byte stream according to a preset frame header identifier, a data length field and a cyclic redundancy check code, and extracting a complete and effective data frame; When the analyzed data frame is a pod state frame or a target position return frame, key information is extracted from the data frame, and is encapsulated into an ROS message in a preset format for release; Subscribing received pitch angle speed and yaw angle speed instructions in a sending thread, and encoding the instructions into corresponding command frames according to a formulated control protocol; Command frames are sent to the pod control unit via an initialized user datagram protocol socket connection or serial port connection to drive the pod to perform the corresponding movements.
9. The method of claim 1, wherein transforming the pixel deviation into a navigational coordinate system based on the camera view angle and the unmanned aerial vehicle attitude information, generating a desired speed vector for the unmanned aerial vehicle, and issuing a speed set point instruction to a unmanned aerial vehicle flight control system via a protocol command interface to drive the unmanned aerial vehicle to perform a tracking flight comprises: Subscribing unmanned aerial vehicle attitude information and normalized pixel coordinates, and starting a calculation flow of an expected speed vector when detecting that an unmanned aerial vehicle flight control system is in a control mode of receiving an external speed set value; according to the horizontal view angle and the vertical view angle of the camera, mapping the pixel deviation calculated based on the normalized pixel coordinates to a camera optical axis coordinate system to obtain a sight direction vector of the tracking target under the camera coordinate system; Transforming the sight direction vector from the camera coordinate system to the navigation coordinate system according to the nacelle gesture quaternion and the body gesture quaternion to obtain a unit direction vector under the navigation coordinate system; Scaling the unit direction vector according to a preset maximum horizontal speed and a preset maximum vertical speed to generate an initial expected speed vector, and carrying out corresponding speed constraint processing on the initial expected speed vector according to the motion type of the tracking target; And converting the expected speed vector subjected to speed constraint processing into an instruction coordinate system of the unmanned aerial vehicle flight control system, and issuing a speed set value instruction to the unmanned aerial vehicle flight control system through a protocol command interface so as to drive the unmanned aerial vehicle to execute tracking flight.
10. The method of claim 9, wherein a phased tracking strategy is employed when the drone enters a target tracking phase: In the first stage, initial cut-in control is executed, namely a fixed yaw angle set value is issued to the flight control system in the preset first stage duration, and a speed instruction generated based on speed constraint processing is issued at the same time; In the second stage, conventional tracking control is executed, wherein after the duration of the first stage is exceeded, the release of the fixed yaw angle set value is stopped, and a speed instruction based on real-time data update is continuously released; The speed constraint processing comprises the steps of respectively calculating actual speed components meeting the limitation of the maximum horizontal flying speed and the maximum vertical flying speed according to the horizontal component and the vertical component of the unit direction vector when the tracking target is a static target, and carrying out length constraint on the synthesized speed vector, and respectively carrying out amplitude limiting processing on the horizontal component and the vertical component of the expected speed vector when the tracking target is a moving target.

Description

Unmanned aerial vehicle target tracking flight control method based on low-delay video stream and small target detection Technical Field The invention relates to the technical field of unmanned aerial vehicle flight control, in particular to an unmanned aerial vehicle target tracking flight control method based on low-delay video stream and small target detection. Background In the target tracking task, the unmanned aerial vehicle generally adopts a system architecture combining video acquisition, target detection, pod control and flight control. The existing implementation generally outputs video stream through a camera or a nacelle, obtains image frames after decoding, and then obtains target position information by utilizing a target detection algorithm, so as to generate nacelle control instructions and unmanned aerial vehicle motion instructions, thereby realizing continuous tracking of targets. However, the existing unmanned aerial vehicle flight control system integrates video processing and target and flight control closed loops, and simultaneously still faces the following outstanding problems that firstly, the video processing time delay and the detection calculated amount are contradictory, if a pure software processing link is adopted, the end-to-end time delay is large, the frame rate is unstable, and the time mismatch between the target position information and the control instruction is caused. Secondly, on heterogeneous hardware platforms, data handling and format conversion overhead among links such as video decoding, scaling, reasoning and the like is large, and system instantaneity is affected. Thirdly, the nacelle control and the unmanned aerial vehicle motion control are often mutually independent, and lack of a unified coordinate conversion and motion constraint mechanism, if respective control instructions are directly generated based on target positions, the problems of inconsistent motion directions, abrupt instruction changes and the like are easily caused. Fourth, interfaces among the modules of the system are inconsistent, an abnormal recovery mechanism is imperfect, and state reset of a filter and the like after a target is lost is incomplete, so that control instruction drift can be caused. Fifth, for small target detection, when the step length of the feature map is larger, the quantization error of the target position is obvious in the existing detection structure, and the detection result is easy to jump and even lose in a short time, so that the continuity and stability of tracking control are affected. Therefore, there is a need for an unmanned aerial vehicle target tracking method that can achieve low-latency video processing, high-stability small target detection and integrated motion control, so as to improve the overall tracking performance and flight control reliability of the system. Disclosure of Invention First, the technical problem to be solved In view of the defects and shortcomings of the prior art, the invention provides an unmanned aerial vehicle target tracking flight control method based on low-delay video streaming and small target detection, which solves the problems of large video processing delay, unstable small target detection and poor pod and flight control cooperativity in the prior art, and realizes the efficient, stable and low-delay automatic tracking of the unmanned aerial vehicle on moving or static targets. (II) technical scheme In order to achieve the above purpose, the main technical scheme adopted by the invention comprises the following steps: In a first aspect, an embodiment of the present invention provides an unmanned aerial vehicle target tracking flight control method based on low-latency video streaming and small target detection, including: The method comprises the steps of performing low-delay decoding on a video stream acquired from an unmanned aerial vehicle camera by adopting a preset network input interface and combining a hardware decoder, and publishing generated image frames in a specified topic; inputting an image frame into a target detection model to execute multi-thread asynchronous reasoning, and outputting a normalized pixel coordinate of a tracking target, wherein the target detection model is an improved model of a high-resolution detection branch with a step length of 4 pixels added into a YOLOv network structure; Generating tracking control quantity of the unmanned aerial vehicle nacelle through combining error filtering and integral amplitude limiting processing by a PID controller according to pixel errors of the normalized pixel coordinates, and sending the tracking control quantity to a nacelle control unit through a protocol command interface; According to the camera field angle and the unmanned aerial vehicle attitude information, pixel deviation is transformed to a navigation coordinate system, an expected speed vector of the unmanned aerial vehicle is generated, and a speed set value instruction is issued t