CN-121998869-A - Foggy-day cross-view geographic positioning method, device, computer equipment and storage medium
Abstract
The application relates to a foggy-day cross-view geographic positioning method, a foggy-day cross-view geographic positioning device, computer equipment and a storage medium. The method comprises the steps of taking an image to be positioned and a gallery image set as input images, inputting a pre-trained cross-view positioning model to determine compact semantic features corresponding to the input images, enabling the cross-view positioning model to comprise a self-adaptive feature defogging layer and a dual-path feature enhancement layer, specifically enabling the self-adaptive feature defogging layer to conduct multi-scale decomposition, self-adaptive filtering, detail guiding fusion contrast enhancement and detail enhancement processing on initial features of the input images to obtain defogging layer output features, enabling the defogging layer output features to be enhanced based on the dual-path feature enhancement layer, enabling the dual-path features of the defogging layer output features to be enhanced to obtain compact semantic features, and determining target matching images corresponding to the image to be positioned in the gallery image set based on the compact semantic features corresponding to the image to be positioned and each gallery image in the gallery image set. By adopting the method, the positioning accuracy across the visual angles can be improved.
Inventors
- LI YANG
- WANG TONG
- Pu Yiting
- MIAO ZHUANG
- WANG JIABAO
- ZHANG RUI
Assignees
- 中国人民解放军陆军工程大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260116
Claims (10)
- 1. A foggy-day cross-view geographic positioning method, comprising: s1, taking an image to be positioned and a gallery image set as input images, inputting a pre-trained cross-view positioning model, and determining compact semantic features corresponding to each gallery image in the image to be positioned and the gallery image set respectively; The processing procedure of the cross-view positioning model on the input image comprises the following steps: Performing multi-scale decomposition and adaptive filtering treatment on initial features corresponding to an input image based on the adaptive feature defogging layer to obtain enhanced basic features, and performing detail guiding fusion on the enhanced basic features to obtain refined features And for the refined features Performing contrast enhancement and detail enhancement treatment to obtain defogging layer output characteristics ; Outputting characteristics to the defogging layer based on the dual-path characteristic enhancement layer The enhanced features are obtained through channel path enhancement and space self-adaptive enhancement, and compression projection processing is carried out on the enhanced features to obtain compact semantic features corresponding to the input image ; S2, determining target matching images corresponding to the images to be positioned in the gallery image set based on the images to be positioned and compact semantic features respectively corresponding to the gallery images in the gallery image set.
- 2. The method of claim 1, wherein the process of performing a multi-scale decomposition process on the initial feature corresponding to the input image based on the adaptive feature defogging layer comprises: for initial features corresponding to input images Using a step size of 2 Convolution processing is carried out to obtain basic low-frequency characteristics And for the initial feature Using a step size of 1 Convolution processing is carried out to obtain the same-scale characteristics The initial feature The method comprises the steps of extracting initial characteristics of an input image by DINOv, wherein the input image is a foggy-day image dataset; for the fundamental low frequency characteristic Performing bilinear upsampling and passing through Convolution adjustment to obtain reconstructed base layer features The reconstructed base layer features Is calculated as , wherein, Is bilinear upsampling; computing the co-scale feature And the reconstructed base layer features Residual error between them, get detail features as , Is a detail feature.
- 3. The method of claim 2, wherein the adaptive filtering process comprises: Based on the initial characteristics Generating the initial feature using two successive convolutions The corresponding space-channel self-adaptive weight matrix is as follows , Is a Sigmoid function; Adaptive space-channel weight matrix And the reconstructed base layer features Performing element-by-element multiplication to obtain enhanced basic characteristics as follows , The symbols are multiplied element by element.
- 4. A method according to claim 3, wherein the enhanced base feature is fused in detail to obtain a refined feature Comprising: Will enhance the basic characteristics And detail features Performing channel dimension splicing processing to obtain fusion characteristics as follows ; For fusion features Performing convolution processing and normalization processing to obtain refined features 。
- 5. The method of claim 4, wherein the refining the post-feature Performing contrast enhancement and detail enhancement treatment to obtain defogging layer output characteristics Comprising: For the refined features Global average pooling and convolution processing are carried out to obtain the refined characteristics Channel importance weights of (2) are And weighting the importance of the channel Acting on the refined features Obtain contrast enhancement characteristic of Wherein, the method comprises the steps of, By subjecting the refined features to The method is obtained by global average pooling treatment, and the calculation formula is as follows ; Enhancing contrast enhancement features through detail enhancement processing Is characterized in that the obtained detail enhancement characteristic is that ; Based on detail enhancement features The initial feature Determining defogging layer output characteristics as 。
- 6. The method of claim 1, wherein the outputting features to the defogging layer based on the dual path feature enhancement layer The enhanced features are obtained through channel path enhancement and space self-adaptive enhancement, and compression projection processing is carried out on the enhanced features to obtain compact semantic features corresponding to the input image Comprising: outputting features to the defogging layer Rolling and nonlinear transformation processing are carried out to obtain the defogging layer output characteristics The importance weight of each channel of (a) is as follows And outputting a characteristic to the defogging layer Performing convolution and normalization processing to obtain the defogging layer output characteristics Is of the spatial enhancement coefficient of (1) In which, in the process, Is a Sigmoid function; Adopt a learnable gate control network to integrate defogging layer output characteristics Importance weight of each channel of (a) Spatial enhancement coefficient Obtaining the fusion enhancement coefficient as , wherein, , By spatial enhancement coefficients Expanding to the original channel number along the channel dimension to obtain; Based on fusion enhancement coefficients The defogging layer output feature The reinforced product is characterized in that For the enhanced features Performing compression projection processing to obtain compact semantic features corresponding to the input image 。
- 7. The method of claim 1, wherein the pre-training process of the cross-view positioning model comprises: The method comprises the steps of obtaining a training data set, wherein the training data set comprises a query image and a gallery image set, and the query image and the gallery image set are cross-view images; Constructing a cross-view positioning model based on a preset cross-view positioning network and the training data set, wherein the cross-view positioning network comprises a self-adaptive characteristic defogging layer and a dual-path characteristic enhancement layer; Determining compact semantic features corresponding to the query image based on the adaptive feature defogging layer and the dual-path feature enhancement layer Compact semantic features corresponding to each gallery image in a gallery image set Compact semantic features of a query image corresponding to a positive sample ; Compact semantic features based on query image correspondence Compact semantic features corresponding to each gallery image in a gallery image set Compact semantic features of a query image corresponding to a positive sample Determining the loss function as In which, in the process, Centralizing the gallery image Zhang Tuku the corresponding compact semantic features of the image, Is a temperature parameter; And optimizing the cross-view positioning network by using the loss function to obtain a cross-view positioning model.
- 8. A foggy-day cross-view geographic positioning device, the device comprising: The feature extraction module is used for taking an image to be positioned and a gallery image set as input images, inputting a pre-trained cross-view positioning model, and determining compact semantic features corresponding to each gallery image in the image to be positioned and the gallery image set respectively; The processing procedure of the cross-view positioning model on the input image comprises the following steps: Performing multi-scale decomposition and adaptive filtering treatment on initial features corresponding to an input image based on the adaptive feature defogging layer to obtain enhanced basic features, and performing detail guiding fusion on the enhanced basic features to obtain refined features And for the refined features Performing contrast enhancement and detail enhancement treatment to obtain defogging layer output characteristics ; Outputting characteristics to the defogging layer based on the dual-path characteristic enhancement layer The enhanced features are obtained through channel path enhancement and space self-adaptive enhancement, and compression projection processing is carried out on the enhanced features to obtain compact semantic features corresponding to the input image ; And the positioning module is used for determining a target matching image corresponding to the image to be positioned in the gallery image set based on the image to be positioned and the compact semantic features respectively corresponding to the gallery images in the gallery image set.
- 9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
- 10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
Description
Foggy-day cross-view geographic positioning method, device, computer equipment and storage medium Technical Field The application relates to the technical field of computer vision, in particular to a foggy weather cross-view geographic positioning method, a foggy weather cross-view geographic positioning device, computer equipment and a storage medium. Background With the acceleration of the global urbanization process and the wide application of intelligent traffic systems, high-precision geographic positioning technology has become a core support in various fields such as automatic driving, unmanned aerial vehicle navigation and unmanned aerial vehicle distribution. Cross-view geolocation aims at determining geographic position information of a target image by matching images of different views (such as unmanned aerial vehicles, satellites and the like) of the same place, and the technology does not depend on assistance of a global navigation satellite system (Global Navigation SATELLITE SYSTEM, GNSS) and shows unique advantages in scenes with limited or invalid GNSS signals such as urban canyons. In an actual application scene, complex weather conditions such as foggy days can influence image quality. For unmanned aerial vehicle images with rich details, fog blurs key local texture and structure information, and the advantages of the unmanned aerial vehicle images on high-resolution details are weakened. For satellite images with wide coverage and limited resolution, fog reduces the overall contrast and definition of the image, so that the limited discriminant features become more difficult to extract and match, and the difficulty of cross-view matching is increased. The existing cross-view geographic positioning method is mainly based on CNN (Convolutional Neural Network ) or trans-former to perform cross-view geographic positioning, and the methods can achieve higher positioning accuracy under ideal weather conditions, but under severe weather conditions such as foggy days, problems such as contrast reduction, color distortion, fuzzy texture details and the like can occur in images, so that feature extraction and matching are difficult, and the positioning accuracy is remarkably reduced. Disclosure of Invention Based on the above, it is necessary to provide a foggy-day cross-view geographic positioning method, device, computer equipment and storage medium, which can solve the problem of difficult feature extraction and matching, and further improve the cross-view positioning accuracy. In a first aspect, the present application provides a foggy weather cross-view geographic positioning method, including: S1, taking an image to be positioned and a gallery image set as input images, inputting a pre-trained cross-view positioning model, and determining compact semantic features corresponding to each gallery image in the image to be positioned and the gallery image set respectively; the processing procedure of the cross-view positioning model on the input image comprises the following steps: Performing multi-scale decomposition on initial features corresponding to an input image based on the self-adaptive feature defogging layer, performing self-adaptive filtering treatment to obtain enhanced basic features, and performing detail guiding fusion on the enhanced basic features to obtain refined features And for refined characteristicsPerforming contrast enhancement and detail enhancement treatment to obtain defogging layer output characteristics; Outputting characteristics to defogging layers based on dual-path characteristic enhancement layersThe channel path enhancement and the space self-adaption enhancement are carried out to obtain enhanced features, and compression projection processing is carried out on the enhanced features to obtain compact semantic features corresponding to the input image; S2, determining target matching images corresponding to the images to be positioned in the gallery image set based on the images to be positioned and the compact semantic features respectively corresponding to the gallery images in the gallery image set. In one embodiment, the process of performing multi-scale decomposition processing on the initial features corresponding to the input image based on the adaptive feature defogging layer includes: for initial features corresponding to input images Using a step size of 2Convolution processing is carried out to obtain basic low-frequency characteristicsAnd for initial characteristicsUsing a step size of 1Convolution processing is carried out to obtain the same-scale characteristicsInitial characteristicsThe method comprises the steps of extracting initial characteristics of an input image by DINOv, wherein the input image is a foggy-day image dataset; For basic low frequency characteristics Performing bilinear upsampling and passing throughConvolution adjustment to obtain reconstructed base layer featuresReconstructing base layer featuresIs calculated as, wherein,Is