CN-121861497-B - Cross-view target geographic positioning method and device based on cross-task knowledge migration
Abstract
The invention relates to a cross-view target geographic positioning method and device based on cross-task knowledge migration. The method comprises the steps of constructing a pre-training feature extraction module, comprising a first feature image extraction module and a second feature image extraction module which are identical in structure and share in weight, respectively inputting a reference image and a query image into the first feature image extraction module and the second feature image extraction module based on a first loss function to perform contrast learning training to obtain a trained first feature image extraction module and a trained second feature image extraction module, constructing a cross-view target geographic positioning model, transferring the trained first feature image extraction module and the trained second feature image extraction module into the cross-view target geographic positioning model, and inputting the reference image and the query image with the query target into the cross-view target geographic positioning model based on a second loss function to perform training to obtain the trained cross-view target geographic positioning model. The method and the device can improve the target positioning precision and generalization capability under the cross-view scene.
Inventors
- CHEN HAO
- YANG ANRAN
- LI JIANGSHAN
- PENG SHUANG
- JIA QINGREN
- WU JIANGJIANG
- DU CHUN
- CHEN LUO
- LI JUN
- XIONG WEI
Assignees
- 中国人民解放军国防科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20250804
Claims (9)
- 1. A cross-view target geographic positioning method based on cross-task knowledge migration, the method comprising: acquiring a first data set, wherein the first data set comprises a reference image and a query image; Constructing a pre-training feature extraction module, wherein the pre-training feature extraction module comprises a first feature image extraction module and a second feature image extraction module which are identical in structure and shared in weight; Then, based on the first loss function, respectively inputting the reference image and the query image into the first characteristic image extraction module and the second characteristic image extraction module for comparison learning training to obtain a trained first characteristic image extraction module and a trained second characteristic image extraction module; acquiring a second data set, wherein the second data set comprises a reference image and a query image with a query target; Constructing a cross-view target geographic positioning model, and transferring the trained first feature map extraction module and the trained second feature map extraction module to the cross-view target geographic positioning model; adding the first loss function into a second loss function, and then inputting the reference image and the query image with the query target into the cross-view target geographic positioning model for training based on the second loss function to obtain a trained cross-view target geographic positioning model; Based on the first loss function, respectively inputting the reference image and the query image into the first feature map extraction module and the second feature map extraction module to perform contrast learning training, including: The query image is input into the first feature image extraction module for processing, and a first feature image is output; performing inverse polar coordinate feature transformation on the first feature map to obtain a transformed first feature map; And taking the first loss function as constraint, and performing similarity-based contrast learning on the transformed first feature map and the second feature map.
- 2. The cross-view target geolocation method based on cross-task knowledge migration of claim 1, wherein the first loss function expression is: ; ; ; In the formula, Representing a first loss function; representing an anchor query sample; A positive sample of satellite images representing the same geographic location as the anchor query sample; Representing a negative sample of the satellite image from a different geographic location than the anchor query sample, max is a function of the maximum of all parameters, Representing the feature distance of the reduced positive sample pair; indicating the feature distance of the enlarged negative sample pair.
- 3. The cross-view target geographic positioning method based on cross-task knowledge migration of claim 1 or 2, wherein the cross-view target geographic positioning model comprises a position encoder, a feature extraction module, a cross-view feature fusion module and a target frame prediction head, and the trained first feature map extraction module and second feature map extraction module are migrated to the feature extraction module.
- 4. The cross-task knowledge migration-based cross-view target geographic positioning method according to claim 3, wherein the trained first feature map extraction module and the trained second feature map extraction module are migrated to the feature extraction module, and the migrated gain information is expressed as: ; In the formula, Representing two-stage cross-view target geolocation based on cross-task knowledge migration method STONet known below Three input post-variables Is not deterministic; Representing without any reference information Is not deterministic; Reference target data space representing post-migration predictions Combined variable group ) Mutual information between them, where X is the query image data space, Is to inquire the target data space, Is a reference image data space Reference target data space representing pre-migration predictions Combined variable group ) Mutual information between Representing prior to migration is known Three input post-variables Is not deterministic.
- 5. The cross-view target geolocation method based on cross-task knowledge migration of claim 3, wherein the second loss function expression is: ; In the formula, Representing a second loss function; representing a prediction bounding box position loss function; representing a confidence loss function; representing a first loss function; 、 representing the super parameter.
- 6. The cross-view target geolocation method based on cross-task knowledge migration of claim 5, wherein the prediction bounding box position loss function expression is: ; In the formula, Representing the coordinates of the central point of the prediction boundary box; A wide-high value representing a prediction bounding box; representing the coordinates of the center point of the real boundary frame; a wide-high value representing a real bounding box; the sigmoid function is represented as a function, Representing a downward rounding function; Representing the number of instances.
- 7. The cross-view target geolocation method based on cross-task knowledge migration of claim 5, wherein the confidence loss function expression is: ; In the formula, Representing the confidence of all prediction bounding boxes predicted for the kth instance, Representing binary labels corresponding to the prediction bounding boxes; Representing the number of instances.
- 8. The cross-view target geographic positioning method based on cross-task knowledge migration of claim 5, wherein inputting the reference image and the query image with the query target into the cross-view target geographic positioning model for training based on the second loss function to obtain a trained cross-view target geographic positioning model comprises: inputting the query image with the query target into a position encoder for encoding, and outputting position characteristics; Adding the position features and the query image with the query target, inputting the added position features and the query image with the query target into the first feature map extraction module for processing, and outputting a third feature map; inputting the reference image into the second feature map extraction module for processing, and outputting a fourth feature map; Performing inverse polar coordinate feature transformation on the third feature map to obtain a transformed third feature map; inputting the transformed third feature map and the fourth feature map into the cross-view feature fusion module for fusion to obtain fusion features; inputting the fusion characteristics into the target frame prediction head for processing, and outputting a prediction boundary frame and confidence; And based on the second loss function, calculating the difference between the prediction boundary box and the real boundary box by combining the confidence coefficient, and updating model parameters through back propagation until model loss converges to obtain a trained cross-view target geographic positioning model.
- 9. A cross-task knowledge migration-based cross-view target geographic positioning device, characterized in that the cross-task knowledge migration-based cross-view target geographic positioning method as claimed in any one of claims 1 to 8 is adopted, and the device comprises: a first data set acquisition module for acquiring a first data set including a reference image and a query image; the pre-training feature extraction module is used for constructing a pre-training feature extraction module, and comprises a first feature image extraction module and a second feature image extraction module which are identical in structure and shared in weight; The pre-training feature extraction module is used for constructing a first loss function according to the principle of mutual information maximization in the positive sample pair and mutual information minimization in the negative sample pair, inputting the reference image and the query image into the first feature image extraction module and the second feature image extraction module respectively based on the first loss function for comparison learning training, and obtaining a trained first feature image extraction module and a trained second feature image extraction module; a second data set acquisition module for acquiring a second data set including a reference image and a query image with a query target; the cross-view target geographic positioning model construction module is used for constructing a cross-view target geographic positioning model and transferring the trained first feature map extraction module and the trained second feature map extraction module to the cross-view target geographic positioning model; and the cross-view target geographic positioning model training module is used for adding the first loss function into a second loss function, and then inputting the reference image and the query image with the query target into the cross-view target geographic positioning model for training based on the second loss function to obtain a trained cross-view target geographic positioning model.
Description
Cross-view target geographic positioning method and device based on cross-task knowledge migration Technical Field The invention relates to the technical field of cross-view target geographic positioning, in particular to a cross-view target geographic positioning method and device based on cross-task knowledge migration. Background Cross-view object geo-location (CVOGL) refers to a technique for determining the geographic coordinates of an object by detecting the location of some object to be located within the ground or drone image in the satellite image. Compared with a positioning method relying on GPS signals, the technology utilizes the spatial relationship of visual characteristics to realize the positioning of a specified building target when GPS information is lost or in a GPS refusing area. However, CVOGL faces a double challenge compared to conventional Object Localization in that, on one hand, the significant viewing angle difference between the query image and the reference image results in dramatic changes in the appearance of the same target, rendering the template matching mechanism based on shallow visual features ineffective, and on the other hand, CVOGL tasks focus on unique instance matching and positioning across viewing angles, each query instance corresponds to only a single paired sample, and similar instance aggregation is lacking, so that the model cannot learn features by means of large-scale category data (such as 1000 types of samples commonly found in ImageNet), and the lack of such supervisory signals will cause poor model generalization ability and training over-fitting risks. Thus, it is difficult to fundamentally solve the above-described problems with the modules (such as feature fusion or pre-probes) in the island optimization Object Localization. Disclosure of Invention Based on the above, it is necessary to provide a cross-view target geographic positioning method and device based on cross-task knowledge migration, which can realize target correspondence under a cross-view scene and improve model generalization capability. A cross-view target geolocation method based on cross-task knowledge migration, the method comprising: acquiring a first data set, wherein the first data set comprises a reference image and a query image; Constructing a pre-training feature extraction module, wherein the pre-training feature extraction module comprises a first feature image extraction module and a second feature image extraction module which are identical in structure and shared in weight; Then, based on the first loss function, respectively inputting the reference image and the query image into the first characteristic image extraction module and the second characteristic image extraction module for comparison learning training to obtain a trained first characteristic image extraction module and a trained second characteristic image extraction module; acquiring a second data set, wherein the second data set comprises a reference image and a query image with a query target; Constructing a cross-view target geographic positioning model, and transferring the trained first feature map extraction module and the trained second feature map extraction module to the cross-view target geographic positioning model; And adding the first loss function into a second loss function, and then inputting the reference image and the query image with the query target into the cross-view target geographic positioning model for training based on the second loss function to obtain a trained cross-view target geographic positioning model. In another aspect, a cross-view target geographic positioning device based on cross-task knowledge migration is provided, including: a first data set acquisition module for acquiring a first data set including a reference image and a query image; the pre-training feature extraction module is used for constructing a pre-training feature extraction module, and comprises a first feature image extraction module and a second feature image extraction module which are identical in structure and shared in weight; The pre-training feature extraction module is used for constructing a first loss function according to the principle of mutual information maximization in the positive sample pair and mutual information minimization in the negative sample pair, inputting the reference image and the query image into the first feature image extraction module and the second feature image extraction module respectively based on the first loss function for comparison learning training, and obtaining a trained first feature image extraction module and a trained second feature image extraction module; a second data set acquisition module for acquiring a second data set including a reference image and a query image with a query target; the cross-view target geographic positioning model construction module is used for constructing a cross-view target geographic positioning model and transferring th