CN-121982555-A - Cross-view image matching method and device for unmanned aerial vehicle image and satellite image

CN121982555ACN 121982555 ACN121982555 ACN 121982555ACN-121982555-A

Abstract

The invention discloses a cross-view image matching method and device for an unmanned aerial vehicle image and a satellite image, wherein an image dataset comprising a plurality of geographic entities marked with geographic positions is obtained, a JamMa model based on Mamba state space modeling mechanism is constructed, a JamMa model based on Mamba state space modeling mechanism is trained based on the image dataset to obtain a cross-view image matching model, a query image to be matched is input into the cross-view image matching model, and a target image matched with the query image is determined in a target image database through model calculation. By introducing the JamMa model with the sequence state modeling and global dependency capturing capability after improvement into the unmanned aerial vehicle-satellite cross-view image matching and positioning task, the unmanned aerial vehicle-satellite cross-view image matching and positioning task is synchronously improved in precision and efficiency.

Inventors

KUANG CHENGZHONG
LIANG SHENGHUI
OU SHIFENG
CAI XIAO
Mo Jiangting
LIN XIAOXUAN
ZHONG CHUNYU
LI YUEFEN
LIU JUNXIN
XIE QIANQIAN

Assignees

广西电网有限责任公司南宁供电局

Dates

Publication Date: 20260505
Application Date: 20251202

Claims (10)

1. A cross-view image matching method for an unmanned aerial vehicle image and a satellite image, comprising: Acquiring an image dataset comprising a plurality of geographic entities marked with geographic positions, wherein the image dataset comprises a satellite image of each geographic entity and a group of unmanned aerial vehicle images acquired from a plurality of different perspectives; Constructing a JamMa model based on Mamba state space modeling mechanism; training the JamMa model based on Mamba state space modeling mechanism based on the image dataset to obtain a cross-view image matching model; And inputting the query image to be matched into the cross-view image matching model, and determining a target image matched with the query image in a target image database through model calculation.
2. The method for cross-view image matching of an unmanned aerial vehicle image and a satellite image according to claim 1, wherein the training the JamMa model based on Mamba state-space modeling mechanism based on the image dataset to obtain the cross-view image matching model specifically comprises the steps of: Dividing the image data set into a first image data subset and a second image data subset which are mutually non-overlapped in geographic entities, taking the first image data subset as a model training set, and taking the second image data subset as a model test set; Inputting the model training set into the JamMa model based on the Mamba state space modeling mechanism, carrying out back propagation and gradient calculation on model parameters based on a classification loss function and a contrast loss function, carrying out iterative update on the model parameters of the JamMa model by using a AdamW optimizer according to a gradient calculation result until the model converges, and outputting a JamMa model; And evaluating the trained JamMa models based on the model test set to obtain the cross-view image matching model meeting the performance requirements.
3. The cross-view image matching method for unmanned aerial vehicle image and satellite image according to claim 2, wherein the step of inputting the model training set to the JamMa model based on Mamba state space modeling mechanism, performing back propagation and gradient calculation on model parameters based on classification loss function and contrast loss function, iteratively updating model parameters of JamMa model by using AdamW optimizer according to gradient calculation result until model convergence, and outputting JamMa model specifically comprises: sampling a training batch from the model training set, wherein all satellite images and unmanned aerial vehicle image pairs in the training batch have non-repeating geographic tags; Performing feature extraction processing on the unmanned aerial vehicle image and the satellite image to obtain unmanned aerial vehicle projection features and satellite projection features; performing position coding on the unmanned aerial vehicle projection features and the satellite projection features to obtain a first feature sequence and a second feature sequence of fusion position information; Performing cross-view global interaction on the first feature sequence and the second feature sequence based on Mamba state space modeling mechanism to model geometric and semantic relationships between unmanned aerial vehicle images and satellite images, and outputting alignment features after interaction; Based on the alignment features, classification loss and contrast loss are calculated, gradient of model parameters is calculated JamMa through a back propagation algorithm, model parameters are iteratively updated according to the gradient by means of a AdamW optimizer, and when the variation of a loss function is smaller than a preset threshold or the number of iterations is equal to the maximum number of iterations, iteration is stopped and a JamMa model with completed training is output.
4. The method for cross-view image matching between an unmanned aerial vehicle image and a satellite image according to claim 3, wherein the step of performing feature extraction processing on the unmanned aerial vehicle image and the satellite image to obtain unmanned aerial vehicle projection features and satellite projection features specifically comprises: Respectively inputting the sampled unmanned aerial vehicle image and satellite image into a lightweight feature extraction network, and extracting unmanned aerial vehicle features and satellite features based on preset depth; and respectively carrying out channel mapping and nonlinear transformation on the unmanned aerial vehicle features and the satellite features to obtain the unmanned aerial vehicle projection features and the satellite projection features with uniform target dimensions.
5. The method for cross-view image matching between an unmanned aerial vehicle image and a satellite image according to claim 3, wherein the step of performing position coding on the unmanned aerial vehicle projection feature and the satellite projection feature to obtain a first feature sequence and a second feature sequence of the fused position information specifically comprises: Respectively flattening the unmanned aerial vehicle projection features and the satellite projection features to obtain an unmanned aerial vehicle feature sequence and a satellite feature sequence; constructing a first position coding vector corresponding to the projection feature of the unmanned aerial vehicle and a second position coding vector corresponding to the satellite projection feature; Adding the unmanned aerial vehicle characteristic sequence and the first position coding vector element by element to obtain the first characteristic sequence of the fusion position information; and adding the satellite feature sequence and the second position coding vector element by element to obtain the second feature sequence of the fusion position information.
6. The method for cross-view image matching of an unmanned aerial vehicle image and a satellite image according to claim 5, wherein the step of constructing a first position-coded vector corresponding to the unmanned aerial vehicle projection feature and a second position-coded vector corresponding to the satellite projection feature comprises: Generating specification grid coordinates normalized to a [ -1,1] range based on the spatial dimensions of the unmanned aerial vehicle projection features; Inputting the normalized specification grid coordinates into a key point encoder formed by a multi-layer fully-connected network, and outputting the first position coding vector with the same dimension as the characteristic channel of the projection characteristic of the unmanned aerial vehicle; generating specification grid coordinates normalized to a [ -1,1] range based on the spatial dimensions of the satellite projection features; And inputting the normalized specification grid coordinates into a key point encoder formed by a multi-layer fully-connected network, and outputting the second position coding vector with the same dimension as the characteristic channel of the satellite projection characteristic.
7. The cross-view image matching method for the unmanned aerial vehicle image and the satellite image according to claim 1, wherein the step of inputting the query image to be matched into the cross-view image matching model and determining the target image matched with the query image in the target image database through model calculation specifically comprises the following steps: Acquiring a query image to be matched, an unmanned aerial vehicle image database containing a plurality of candidate unmanned aerial vehicle images, and a satellite image database containing a plurality of candidate satellite images; determining a target image database in the unmanned aerial vehicle image database and the satellite image database based on the image category of the query image; Inputting the query image and the target image database into a cross-view image matching model, and extracting global features of all candidate images in the query image and the target image database; Calculating the similarity between the query image and each candidate image based on the global features, and screening the first N candidate images from the target database according to the similarity to form a candidate set, wherein N is a preset positive integer; Fine-grained similarity recalculation is carried out on the query image and N candidate images in the candidate set; and reordering the N candidate images based on the similarity recalculation result, and determining the target image with the highest similarity according to the reordering result.
8. A cross-view image matching device for an image of an unmanned aerial vehicle and a satellite image, comprising: The system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an image data set of geographic entities with marked geographic positions, wherein the image data set comprises a satellite image of each geographic entity and a group of unmanned aerial vehicle images acquired from a plurality of different perspectives; The construction module is used for constructing a JamMa model based on Mamba state space modeling mechanism; the training module is used for training the JamMa model based on the Mamba state space modeling mechanism based on the image dataset to obtain a cross-view image matching model; and the determining module is used for inputting the query image to be matched into the cross-view image matching model, and determining a target image matched with the query image in a target image database through model calculation.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the cross-view image matching method for drone images and satellite images as claimed in any one of claims 1 to 7.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the cross-view image matching method for drone images and satellite images according to any one of claims 1 to 7.

Description

Cross-view image matching method and device for unmanned aerial vehicle image and satellite image Technical Field The invention relates to the technical field of unmanned aerial vehicle navigation and positioning, in particular to a cross-view image matching method and device for unmanned aerial vehicle images and satellite images. Background Cross-view image matching aims at accurately correlating images acquired from the same target scene but at different views to infer the specific location of the site. In recent years, along with the rapid development of unmanned aerial vehicle technology, how to realize accurate positioning and autonomous navigation of unmanned aerial vehicles by means of a cross-view matching technology has become one of the current research hotspots. Aiming at a cross-view matching task between a satellite and an unmanned aerial vehicle platform, a view image of the unmanned aerial vehicle is given, a satellite image which is matched with the view image is needed to be found in a satellite image database, and a view image of the satellite is given, and the unmanned aerial vehicle image which is matched with the view image is needed to be found in the unmanned aerial vehicle image database. At present, a trans-visual angle image matching model for an unmanned aerial vehicle image and a satellite image is generally constructed based on a trans-former or CNN architecture, however, the model is difficult to effectively learn characteristic representation with high discriminant between the unmanned aerial vehicle with obvious visual angle difference and the satellite image, and often causes the matching system to fail or generates a large number of mismatching. Disclosure of Invention In view of the above, the invention provides a cross-view image matching method, a device, an electronic device and a medium for unmanned aerial vehicle images and satellite images, which are used for solving the technical problems that the conventional Transformer and CNN model are difficult to learn the distinguishing characteristics between unmanned aerial vehicles and satellite images, and the matching failure and the mismatching rate are high. In a first aspect, a cross-view image matching method for a drone image and a satellite image is provided, the method comprising: acquiring an image dataset comprising a plurality of geographic entities marked with geographic positions, wherein the image dataset comprises a satellite image of each geographic entity and a group of unmanned aerial vehicle images acquired from a plurality of different perspectives; Constructing a JamMa model based on Mamba state space modeling mechanism; Training a JamMa model based on Mamba state space modeling mechanisms based on the image dataset to obtain a cross-view image matching model; and inputting the query image to be matched into a cross-view image matching model, and determining a target image matched with the query image in a target image database through model calculation. In a second aspect, there is provided a cross-view image matching apparatus for a drone image and a satellite image, the apparatus comprising: the acquisition module is used for acquiring an image data set of geographic entities containing a plurality of marked geographic positions, wherein the image data set comprises a satellite image of each geographic entity and a group of unmanned aerial vehicle images acquired from a plurality of different perspectives; The construction module is used for constructing a JamMa model based on Mamba state space modeling mechanism; The training module is used for training the JamMa model based on Mamba state space modeling mechanism based on the image data set to obtain a cross-view image matching model; and the determining module is used for inputting the query image to be matched into the cross-view image matching model, and determining a target image matched with the query image in the target image database through model calculation. In a third aspect, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the computer program implementing the steps of the above-described cross-view image matching method for an unmanned aerial vehicle image and a satellite image. In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of the above-described cross-view image matching method for an unmanned aerial vehicle image and a satellite image. According to the scheme realized by the cross-view image matching method, the device, the electronic equipment and the storage medium for the unmanned aerial vehicle image and the satellite image, the improved JamMa model with sequence state modeling and global dependency capturing capability is introduced into the unmanned aerial vehicle-sat