CN-122023532-A - Method for three-dimensional positioning and tracking of dynamic target applied to surface fitting and dynamic separation of unmanned aerial vehicle

CN122023532ACN 122023532 ACN122023532 ACN 122023532ACN-122023532-A

Abstract

The embodiment of the invention provides a method for three-dimensionally positioning and tracking a dynamic target, which is applied to ground surface fitting and dynamic separation of an unmanned aerial vehicle, and comprises the steps of collecting image data of a target scene, and detecting and classifying a dynamic-static object based on semantics; then, carrying out SLAM reconstruction and earth surface three-dimensional fitting on the ground characteristics in the static characteristics, then carrying out single-frame three-dimensional positioning calculation on the dynamic target by utilizing the earth surface height constraint, and finally carrying out joint optimization by fusing multi-frame observation and a motion model. The method solves the core problems of difficult acquisition of depth information, poor environmental adaptability and high cost in the prior art.

Inventors

MA GUANGDI
LU JIANGPING
LU YILONG
Mu Yijuan
LUO WENHAO
XIAO CHANGLIN
YANG WEICHEN
WANG XIANG
SUN CHUNLING
YANG SHENGJUAN
LI TIANYU
Ji Zhenzhe
CHEN WEI

Assignees

浙江国遥地理信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (10)

1. A method for three-dimensional positioning and tracking of a dynamic target applied to surface fitting and dynamic separation of an unmanned aerial vehicle, the method comprising: continuously collecting image data of a target scene, extracting image features of the image data, dividing feature types, and outputting the divided feature types and an image mask, wherein the feature types comprise a static area and a dynamic target, and the types of the static area comprise a plane static area and a vertical static area; Extracting static feature points based on a mask of the static region, combining the position of the dynamic target on an image, determining the ROI of the dynamic target, traversing a feature point set of the static region based on the ROI, and generating a 3D point cloud corresponding to the region where the dynamic target is located after the feature point set is matched with the feature point of the same semantic region of a history frame; based on the 3D point cloud, a corresponding three-dimensional map is established; And taking the contact point of the dynamic target in the image as an observation point, projecting the observation point to the three-dimensional map, solving the three-dimensional coordinates of the dynamic target, and combining the historical track observation value of the dynamic target to generate the motion track and the prediction result of the dynamic target.
2. The method of claim 1, further comprising, after the capturing of the image data of the target scene: Inputting the image data into a pre-training neural network, and outputting pixel types in the image data; based on the pixel category, a corresponding mask is generated, wherein the mask comprises a binary mask of a static area, a planar static area mask, a dynamic target instance and an uncertain area mask.
3. The method of claim 1, wherein the determining the ROI of the dynamic object in combination with the location of the dynamic object on the image comprises: and acquiring a pixel coordinate frame of a bottom center point of the dynamic target to frame an ROI around the target, and carrying out SLAM earth surface reconstruction of the target area by taking the ROI as a range.
4. A method according to claim 3, wherein the depth calculation step of the dynamic object comprises: and back-projecting the bottom center point of the dynamic target, and intersecting the ground surface of the image along the back-projection direction to estimate the depth.
5. A system for three-dimensional localization and tracking of a dynamic target applied to surface fitting and dynamic separation of an unmanned aerial vehicle, the system comprising; The acquisition module is used for continuously acquiring image data of a target scene, extracting image features of the image data, dividing feature types, and outputting the divided feature types and an image mask, wherein the feature types comprise a static area and a dynamic target, and the types of the static area comprise a plane static area and a vertical static area; The feature extraction module is used for extracting static feature points based on the mask of the static region, combining the position of the dynamic target on the image, determining the ROI of the dynamic target, traversing the feature point set of the static region based on the ROI, and generating a 3D point cloud corresponding to the region where the dynamic target is located after the feature point set is matched with the feature points of the same semantic region of the history frame; the three-dimensional map module is used for establishing a corresponding three-dimensional map based on the 3D point cloud; The track module is used for solving the three-dimensional coordinates of the dynamic target by taking the contact point of the dynamic target in the image as an observation point and combining the three-dimensional map, and generating a motion track and a prediction result of the dynamic target by combining the historical track observation value of the dynamic target.
6. The system of claim 5, wherein the system further comprises: The pre-training module is used for inputting the image data into a pre-training neural network and outputting pixel types in the image data; And the mask module is used for generating a corresponding mask based on the pixel category, wherein the mask comprises a binary mask of a static area, a planar static area mask, a dynamic target instance and an uncertain area mask.
7. The system of claim 5, wherein the system further comprises: and the acquisition module is used for acquiring the ROI around the dynamic target by using the pixel coordinate frame of the bottom center point of the dynamic target, and carrying out SLAM earth surface reconstruction of the target area by taking the ROI as a range.
8. The system of claim 7, wherein the system further comprises: and the depth module is used for back-projecting the bottom center point of the dynamic target and intersecting the ground surface of the image along the back-projection direction to estimate the depth.
9. An electronic device comprising a processor and a memory; the processor is connected with the memory; The memory is used for storing executable program codes; The processor runs a program corresponding to executable program code stored in the memory by reading the executable program code for performing the method according to any one of claims 1-4.
10. A computer readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, implements the method according to any of claims 1-4.

Description

Method for three-dimensional positioning and tracking of dynamic target applied to surface fitting and dynamic separation of unmanned aerial vehicle Technical Field The invention relates to the technical field of unmanned aerial vehicle tracking, in particular to a method for three-dimensional positioning and tracking of a dynamic target applied to ground surface fitting and dynamic separation of an unmanned aerial vehicle. Background Along with the increasing popularity of unmanned aerial vehicles, the unmanned aerial vehicle is used for carrying out motion monitoring and real-time positioning on targets in various fields such as inspection and security monitoring, intelligent traffic and traffic flow analysis, motion target interaction, geographic information acquisition and the like. However, at present, when a moving object is three-dimensionally positioned by using unmanned plane monocular vision, the following problems exist: If more pure dynamic SLAM and target tracking schemes are adopted in the prior art, the scheme (such as a derivative method based on ORB-SLAM 3) firstly runs the visual SLAM to estimate the motion of the unmanned aerial vehicle and reconstruct a static environment, and meanwhile, the dynamic target is detected and tracked. The core drawback is that the depth information of the dynamic object is extremely dependent on multi-frame triangulation. When the target and the unmanned aerial vehicle have relative motion, the triangularization requires accurate target characteristic point cross-frame matching and unmanned aerial vehicle pose, any matching error or pose drift can cause the depth calculation to be extremely unstable, and particularly when the target moves along the optical axis direction of a camera, the calculation result has large noise and low reliability. If the ground is assumed to be a planar solution, in some simplified solutions, the target will be assumed to be on a level ground of known height, and the target position will be back calculated from the image coordinates. The disadvantage of this solution is that it cannot cope with non-planar, undulating real ground surfaces (such as ramps, grasslands, road edges), resulting in a high estimation error and thus in a significant deviation of the position. Still other rely on high cost sensor solutions where depth information can be obtained directly using LiDAR or binocular/multi-camera, but significantly increase the hardware cost, weight and power consumption of the system and are not suitable for small commercial unmanned aerial vehicle platforms that are cost and load sensitive. Disclosure of Invention Aiming at the problems in the prior art, the embodiment of the invention provides a method and a device for three-dimensional positioning and tracking of a dynamic target, which are applied to ground surface fitting and dynamic separation of an unmanned aerial vehicle. In a first aspect, embodiments of the present disclosure provide a method for three-dimensional positioning and tracking of a dynamic target applied to surface fitting and dynamic separation of an unmanned aerial vehicle, the method comprising: continuously collecting image data of a target scene, extracting image features of the image data, dividing feature types, and outputting the divided feature types and an image mask, wherein the feature types comprise a static area and a dynamic target, and the types of the static area comprise a plane static area and a vertical static area; Extracting static feature points based on a mask of the static region, combining the position of the dynamic target on an image, determining the ROI of the dynamic target, traversing a feature point set of the static region based on the ROI, and generating a 3D point cloud corresponding to the region where the dynamic target is located after the feature point set is matched with the feature point of the same semantic region of a history frame; based on the 3D point cloud, a corresponding three-dimensional map is established; And taking the contact point of the dynamic target in the image as an observation point, projecting the observation point to the three-dimensional map, solving the three-dimensional coordinates of the dynamic target, and combining the historical track observation value of the dynamic target to generate a motion track prediction result of the dynamic target. In a second aspect, embodiments of the present disclosure provide a system for three-dimensional localization and tracking of a dynamic target for use in surface fitting and dynamic separation of a drone, the system comprising: The acquisition module is used for continuously acquiring image data of a target scene, extracting image features of the image data, dividing feature types, and outputting the divided feature types and an image mask, wherein the feature types comprise a static area and a dynamic target, and the types of the static area comprise a plane static area and a vertical static area; Th