CN-121661144-B - Unmanned aerial vehicle visual positioning method based on inclined three-dimensional model

CN121661144BCN 121661144 BCN121661144 BCN 121661144BCN-121661144-B

Abstract

According to the unmanned aerial vehicle visual positioning method based on the inclined three-dimensional model, the inclined three-dimensional model constructed by an inclined photogrammetry technology is used as a priori geospatial database, and firstly, an optical image library of an orthographic view angle and a corresponding digital surface model are generated by rendering the three-dimensional model. Then, aiming at query images shot by the unmanned aerial vehicle in real time, the system sequentially executes two key steps, namely, a first step of fast image retrieval based on feature similarity, a most similar candidate image is efficiently screened from a mass image library, and a second step of fine image matching based on the candidate images and the query images, so that a high-quality 2D-3D matching pair is established. Finally, the three-dimensional position of the unmanned aerial vehicle image under the world coordinate system can be accurately calculated in real time by calculating the spatial correspondence of the matched pairs. The method realizes high-precision and full-autonomous position and attitude estimation of the unmanned aerial vehicle in a known large-scale scene.

Inventors

LIU TIANQING
KOU YUAN
LIU XINDING
CHEN TAO
ZHAN YOUWEI
LI JIAYUAN
An guanxing
SHI PENGCHENG

Assignees

湖南省第一测绘院

Dates

Publication Date: 20260508
Application Date: 20260202

Claims (7)

1. A visual positioning method of unmanned aerial vehicle based on inclined three-dimensional model is characterized in that, Loading pre-constructed oblique three-dimensional model data, and generating a digital orthographic reference image library and a digital surface model which cover a target area through a rendering technology; taking images acquired by an unmanned aerial vehicle in real time as query images, respectively extracting global feature vectors of the query images and images in the orthographic reference image library by using a pre-trained deep convolutional neural network, calculating the similarity of the global feature vectors of the query images and the images in the orthographic reference image library, and selecting the images in the first K orthographic reference image libraries with high similarity as candidate reference images, wherein K is a natural number larger than 0; Performing cross-view image matching on the query image and the candidate reference image to obtain a 2D-2D matching pair set between the query image and the candidate image, and then converting the 2D-3D matching pair set between the query image and the digital surface model according to the coordinate corresponding relation between the orthographic reference image library and the digital surface model; performing iterative optimization on the 2D-3D matching pair set by using a random sampling consistency algorithm to obtain optimal image pose transformation parameters; calculating three-dimensional position coordinates and three attitude angle parameters of the query image under a world coordinate system by adopting a geometric calculation method according to the optimal image pose transformation parameters to obtain an unmanned aerial vehicle pose estimation result; The cross-view image matching includes a coarse matching stage and a fine matching stage, and specifically includes: firstly, extracting multi-scale feature representations of a query image and a candidate reference image by adopting a feature pyramid network; then, in the rough matching stage, establishing global association between the query image and the candidate reference image through a self-attention mechanism and a cross-attention mechanism, generating initial matching pairs at each position of the feature map, calculating matching confidence, and selecting the matching pairs with the confidence higher than a preset threshold as a rough matching result; Finally, in the fine matching stage, the coarse matching result is used as a priori, the fine matching position is realized in the local characteristic window through a multi-layer perceptron network, and the error matching is eliminated based on the constraint of the basic matrix, so that a 2D-2D matching pair set between the query image and the candidate image is obtained; the matching confidence is calculated by adopting double softmax operation, which is specifically as follows: , Wherein, the Representing the confidence of the match for the matched pair (i, j), An original similarity score between the feature vector representing point i of the query image and the feature vector representing point j of the candidate reference image, Representing the sum of the exponential similarity of a single point i of the query image, with all its possible matching points in the candidate reference image, Representing the exponential similarity summation of a single point j in the candidate reference image and all possible matching points in the query image; the method for generating the digital orthographic reference image library and the digital surface model covering the target area by the rendering technology specifically comprises the following steps: Constructing a virtual orthographic projection camera based on the loaded oblique three-dimensional model data, setting a projection matrix of the virtual orthographic projection camera to cover the geographic range of a target area, and determining the resolution of an output image; configuring a frame buffer object, creating a color accessory for storing an orthophoto, and a depth accessory for recording the height information of a digital surface model, and establishing a complete off-screen rendering pipeline; Performing orthographic projection rendering, drawing a three-dimensional model into a frame buffer area, generating an orthographic image containing ground surface true color information, and simultaneously acquiring a ground elevation value corresponding to each pixel; and reading a rendering result from the frame buffer, storing the color buffer data as an orthophoto, converting the depth buffer data into an actual elevation value, and constructing a digital surface model.
2. The unmanned aerial vehicle visual positioning method of claim 1, wherein when calculating the similarity between the query image global feature vector and the image global feature vectors in the orthoreference image library, calculating by using cosine similarity after L2 normalization is performed on all global feature vectors.
3. The unmanned aerial vehicle visual positioning method of claim 1, wherein the iterative optimization of the set of 2D-3D matching pairs using a random sample consensus algorithm, comprises: initializing RANSAC algorithm parameters, including setting maximum iteration times, an interior point judgment threshold value and a confidence level; Randomly extracting a minimum sample set from the initial 2D-3D matching pair set, and calculating a model parameter assumption of the current iteration by using the extracted minimum sample set to generate a preliminary pose transformation model; calculating the re-projection errors of all matched pairs based on the model parameter assumption of the current iteration, marking the matched points with the errors smaller than a preset threshold as interior points, counting the number of the current interior points, performing multiple iterations, dynamically updating an optimal model, and reserving the model parameters with the maximum interior point set; and selecting the model with the largest number of interior points as a final estimation result, refining model parameters by using all interior points, removing all exterior points, and outputting optimized 2D-3D matching pairs and optimal image pose transformation parameters.
4. The unmanned aerial vehicle visual localization method of claim 1, wherein the geometric solution method may employ spatial front intersection or perspective n-point localization.
5. The unmanned aerial vehicle visual positioning method of claim 4, wherein three-dimensional position coordinates and three attitude angle parameters of the query image in a world coordinate system are calculated by adopting perspective n-point positioning, and the specific steps comprise: Constructing a mathematical model for perspective n-point positioning, and establishing a projection geometric relationship between two-dimensional pixel coordinates and three-dimensional space coordinates to determine external parameters of the camera to be solved; Solving an initial solution of a perspective n-point positioning problem by adopting a direct linear transformation method, and calculating a rough estimation value of the pose of the camera by constructing a linear equation set; Based on the initial solution, a nonlinear optimization objective function with minimized reprojection error is established, and the pose parameters of the camera are refined through an iterative optimization algorithm; Calculating the accurate three-dimensional position coordinates and three attitude angle parameters of the query image under the world coordinate system by using the optimized pose parameters; And verifying the rationality of the solution result, checking whether the re-projection error is in an allowable range, and outputting the unmanned aerial vehicle pose estimation result.
6. Unmanned aerial vehicle visual positioning system based on a tilted three-dimensional model, comprising at least a microprocessor and a memory, characterized in that the microprocessor is programmed or configured to perform the steps of the unmanned aerial vehicle visual positioning method according to any one of claims 1 to 5, or in that the memory has stored therein a computer program programmed or configured to perform the unmanned aerial vehicle visual positioning method according to any one of claims 1 to 5.
7. A computer readable storage medium having stored therein a computer program programmed or configured to perform the method of unmanned aerial vehicle visual localization of any one of claims 1 to 5.

Description

Unmanned aerial vehicle visual positioning method based on inclined three-dimensional model Technical Field The invention belongs to the field of visual positioning, and particularly relates to an unmanned aerial vehicle visual positioning method based on an inclined three-dimensional model. Background In recent years, unmanned aerial vehicle technology is increasingly widely applied in the fields of smart cities, agricultural monitoring, emergency rescue and the like, but accurate positioning under a complex environment still faces serious challenges, such as satellite signals are easy to be blocked, dynamic interference exists, and high-precision positioning equipment is high in cost. To break through these technical bottlenecks, deep fusion of artificial intelligence, high-end manufacturing and unmanned system technologies is required, and the innovation of intelligent perception and autonomous control technologies is promoted. Under the dual drive of policy and technology, a positioning method which does not depend on satellite signals, has low cost and high precision is developed, and has important significance for improving the autonomous operation capability and reliability of the unmanned aerial vehicle in a complex scene. Disclosure of Invention The invention aims to provide an unmanned aerial vehicle visual positioning method based on an inclined three-dimensional model, so as to improve the real-time performance and accuracy of position perception of an unmanned aerial vehicle in a complex environment, and effectively solve the problems of failure, weak anti-interference capability, high cost of high-precision equipment and the like of a traditional satellite positioning mode in a signal shielding area, wherein the unmanned aerial vehicle visual positioning method comprises the following steps: The method comprises the steps of loading pre-built inclined three-dimensional model data, generating a digital orthographic reference image library and a digital surface model which cover a target area through a rendering technology, taking images acquired in real time by an unmanned aerial vehicle as query images, respectively extracting global feature vectors of the query images and the images in the orthographic reference image library by using a pre-trained deep convolutional neural network, calculating the similarity of the global feature vectors of the query images and the images in the orthographic reference image library, selecting images in the front K orthographic reference image library with high similarity as candidate reference images, wherein K is a natural number larger than 0, performing cross-view image matching on the query images and the candidate reference images to obtain a set of 2D-2D matching pairs between the query images and the candidate images, then converting the 2D-3D matching pairs between the query images and the digital surface model according to the coordinate corresponding relation between the orthographic reference image library and the digital surface model, performing iterative optimization on the set of the 2D-3D matching pairs by using a random sampling consistency algorithm to obtain optimal image pose transformation parameters, calculating three-dimensional coordinate system estimated three-dimensional pose position and three-dimensional pose estimation results under a coordinate system according to the optimal image pose transformation parameters. Preferably, the cross-view image matching includes a coarse matching stage and a fine matching stage, and specifically includes: Firstly, extracting multi-scale characteristic representations of a query image and candidate reference images by adopting a characteristic pyramid network. Then, in the rough matching stage, global association between the query image and the candidate reference image is established through a self-attention mechanism and a cross-attention mechanism, initial matching pairs are generated at each position of the feature map, matching confidence is calculated, and the matching pairs with the confidence higher than a preset threshold are selected as rough matching results. And finally, in the fine matching stage, taking the coarse matching result as a priori, refining the matching position in the local characteristic window through a multi-layer perceptron network, and removing the mismatching based on the constraint of the basic matrix to obtain a 2D-2D matching pair set between the query image and the candidate image. Preferably, the calculation of the matching confidence coefficient adopts a double softmax operation, specifically: , Wherein, the Representing the confidence of the match for the matched pair (i, j),An original similarity score between the feature vector representing point i of the query image and the feature vector representing point j of the candidate reference image,Representing the sum of the exponential similarity of a single point i of the query image, with all its possible