CN-120823463-B - Feature matching-based visible light and infrared image fusion method under unmanned aerial vehicle visual angle

CN120823463BCN 120823463 BCN120823463 BCN 120823463BCN-120823463-B

Abstract

Aiming at the defects of the pixel-level fusion method in the aspects of calculation efficiency, hardware dependence, fusion quality and the like, the invention discloses a visible light and infrared image fusion method based on feature matching under the view angle of an unmanned aerial vehicle, which relates to the image processing technology, comprises an image preprocessing module and a target detection module, the method abandons the traditional thought of pixel-by-pixel processing, starts from the characteristic layer, and achieves efficient and robust image fusion by extracting and aligning the obvious characteristics of the same target in the visible light image and the infrared image.

Inventors

NING XUECHENG

Assignees

合肥云通智航科技有限公司

Dates

Publication Date: 20260512
Application Date: 20250530

Claims (8)

1. The visible light and infrared image fusion method based on feature matching under the view angle of the unmanned aerial vehicle is characterized by comprising the following steps of: An image preprocessing module for receiving visible light image And infrared image As input, image quality is improved through preprocessing operation, modal differences are eliminated, and high-quality input data is provided for subsequent modules; An object detection module for detecting an object from a visible light image And infrared image The targets in the two images are matched through cross-modal association, and target information is provided for subsequent feature extraction and fusion; The feature extraction module is used for extracting the modal complementary features of the targets in the visible light and infrared images, including shape, texture and edge information, and providing rich and robust feature representation for subsequent feature fusion; The feature level fusion module is used for aligning and fusing the features extracted from the visible light image and the infrared image to generate fusion features rich in information and complementary to each other, and providing high-quality feature representation for subsequent tasks, wherein the feature level fusion module comprises a feature alignment sub-module and a feature fusion sub-module; The feature reconstruction module is used for fusing the optimized features Reconstructing the image into a high-quality image, reserving complementary information of the visible light and infrared images, and simultaneously ensuring definition and detail of the reconstructed image; the feature extraction module includes: Extracting high-frequency texture features in a visible light image, including edges and gradients, by adopting ResNet-50 networks according to the visible light features, wherein the first step is that Layer characteristics The calculation is as follows: Wherein, the A convolution operation is represented and is performed, And Respectively the first Weights and bias terms for the layers; then outputs visible light characteristics ; The infrared characteristic adopts an attention mechanism to enhance the significance of a thermal target in an infrared image; Wherein the attention weight The calculation is as follows: Wherein, the As a function of the Sigmoid, And For the weight and bias terms of the attention mechanism, Representing infrared features And visible light features Is spliced by the steps of (1); attention weights are then used For weighting infrared features, highlighting thermal target significance: finally output infrared features 。
2. The method for fusing visible light and infrared images based on feature matching under the view angle of the unmanned aerial vehicle according to claim 1, wherein the image preprocessing module comprises the following processing steps: s1, cross-modal registration, namely performing geometric correction on an image by adopting an affine transformation matrix, eliminating field of view offset, wherein the reference formula is as follows: Wherein the method comprises the steps of In order to rotate the scaling parameter(s), 、 For the translation amount, matching optimization parameters through SIFT feature points, and then outputting a visible light image and an infrared image after registration; s2, noise suppression, wherein on one hand, gaussian filtering is carried out on the infrared image, noise is smoothed, and the reference formula is as follows: Wherein, the The method is a Gaussian kernel function and is used for calculating weights among pixel points, and Z is a normalization factor; on the other hand, non-local mean denoising is adopted for the visible light image, image details are reserved, and the reference formula is as follows: Wherein, the Z' is a normalization factor and is used for outputting a denoised visible light image and an infrared image; S3, dynamic range compression, namely performing histogram equalization on the infrared image to avoid overexposure or underexposure areas, wherein the reference formula is as follows: Wherein, the For gray level frequency, N is the total number of pixels; and finally outputting the infrared image with the compressed dynamic range.
3. The method for fusion of visible and infrared images based on feature matching at an unmanned aerial vehicle viewing angle according to claim 1, wherein the object detection module comprises the following processing steps: S1, a bimodal detection network, namely YOLOv networks sharing a backbone are used for processing visible light images simultaneously And infrared image Outputting the target bounding box , Confidence C, classification probability Calculated by Softmax function: Wherein, the For ROIAlign extracted features, W is a classification weight, and b is a bias term; S2, cross-modal correlation, namely, matching the same target in the visible light and infrared images by combining the intersection ratio IoU of the target bounding boxes and the feature similarity based on the Hungary algorithm, and matching the scores The calculation is as follows: to balance the weights, the contribution of IoU and feature similarity are controlled separately; outputting matched target pairs and matching scores 。
4. The method for fusion of visible and infrared images based on feature matching at the perspective of an unmanned aerial vehicle according to claim 3, wherein the feature alignment submodule is used for ensuring the consistency of features in space and semantics, and the workflow comprises: geometric alignment based on matching pairs of outputs of the object detection module Carrying out spatial registration on the infrared characteristic map by adopting an affine transformation matrix: Wherein the method comprises the steps of Is an affine transformation operator; minimizing visible light characteristics through back propagation optimization And post-conversion infrared features The difference between them, and then output the aligned infrared features ; Semantic alignment, namely constructing feature similarity constraint by utilizing cross-modal contrast learning, and maximizing similarity with target cross-modal features: wherein the method comprises the steps of Representing visible light characteristics And infrared features Cosine similarity between them, T is temp coefficient, the smoothness of similarity distribution is controlled, K is the number of negative samples, and the visible light features after semantic alignment are output And infrared features 。
5. The method for fusion of visible and infrared images based on feature matching at the viewing angle of an unmanned aerial vehicle according to claim 2, wherein the feature fusion submodule comprises: Feature fusion: Wherein A is attention weight, and is used for dynamically distributing contributions of two modal features, and the output of the attention weight is primary fusion feature ; Feature optimization, namely optimizing the fused features, eliminating noise and inconsistency, improving the robustness and discrimination capability of the features, and realizing the optimization process through a convolutional neural network: Wherein, the And To optimize the weight and bias terms of the network, the output is the optimized fusion feature
6. The method for fusion of visible and infrared images based on feature matching at the perspective of an unmanned aerial vehicle according to claim 2, wherein the feature reconstruction module comprises a decoding network part and an image post-processing part; The decoding network part uses a lightweight convolutional neural network or a generating countermeasure network as a decoder to integrate the characteristics The decoding is performed to an image which is then processed, 。
7. The method for fusion of visible and infrared images based on feature matching at an unmanned aerial vehicle viewing angle according to claim 6, wherein the image post-processing section comprises the steps of: s1, denoising, namely removing noise in a reconstructed image; S2, sharpening, namely enhancing the image edges and details by using a Laplacian or an adaptive sharpening filter; s3, dynamic range adjustment, namely optimizing the brightness and contrast of the image by using histogram equalization or adaptive contrast stretching: and then outputting the final reconstructed image 。
8. The method for fusion of visible light and infrared images based on feature matching in an unmanned aerial vehicle viewing angle according to claim 7, wherein the denoising adopts a non-local mean value or a bilateral filtering method.

Description

Feature matching-based visible light and infrared image fusion method under unmanned aerial vehicle visual angle Technical Field The invention belongs to the technical field of image processing, and particularly relates to a visible light and infrared image fusion method based on feature matching under an unmanned aerial vehicle visual angle. Background The target detection system under the unmanned aerial vehicle visual angle is widely applied in the engineering inspection field. The existing target detection schemes under the view angle of most unmanned aerial vehicles are often predicted based on visible light images or infrared images. The resolution and contrast ratio of the visible light image are high, but imaging is difficult under weak light conditions such as night and severe weather, the infrared image can be detected all the day, the condition of an object which cannot be reflected by the visible light image can be captured, but the infrared image is usually low in resolution and crossed in texture. Therefore, if the visible light image and the infrared image are reasonably utilized for complementary fusion, more abundant semantic information can be obtained, so that a robust and information-rich fusion image is generated. The prior art generally adopts a pixel fusion mode in the fusion of visible light and infrared light, but in practical application, a plurality of technical bottlenecks and challenges are still faced, and the following aspects are embodied: The problems of calculation efficiency and instantaneity are that in actual application scenes such as unmanned aerial vehicle inspection and the like, the original image data is huge in scale, and the traditional pixel-level fusion method (such as multi-scale transformation and deep learning network) needs to perform pixel-by-pixel calculation, so that the fusion process is huge in time consumption and difficult to meet the instantaneity requirement. In particular, the deep learning-based end-to-end fusion network further exacerbates the computational burden due to its complex model structure and training process. The problem of hardware dependence and cost is that pixel level fusion requires precise alignment of visible and infrared images at the pixel level, which puts extremely high demands on the synchronicity and stability of imaging devices, increasing hardware cost. The fusion quality and the robustness are that the quality of original pixel points is unstable due to the fact that pixel information is easy to be polluted, noise and the like, and the fusion effect is affected after superposition. While the traditional multi-scale transformation method is difficult to effectively distinguish noise from real features when processing complex scenes, the deep learning network has stronger feature extraction capability, under the condition of insufficient data or larger scene change, the problem of insufficient fitting or generalization capability easily occurs, and the robustness of a fusion result is influenced. Aiming at the defects of the pixel-level fusion method in the aspects of calculation efficiency, hardware dependence, fusion quality and the like, the patent provides a visible light and infrared image fusion algorithm based on feature level. The algorithm abandons the idea of traditional pixel-by-pixel processing, starts from the characteristic layer, and achieves efficient and robust image fusion by extracting and aligning the obvious characteristics of the same target in the visible light image and the infrared image. Disclosure of Invention The invention aims to solve the problems in the prior art, and provides a visible light and infrared image fusion method based on feature matching under the view angle of an unmanned aerial vehicle. Aiming at the problems, the invention aims to provide a visible light and infrared image fusion method based on feature matching under the view angle of an unmanned aerial vehicle, which comprises the following steps: An image preprocessing module for receiving visible light image And infrared imageAs input, image quality is improved through preprocessing operation, modal differences are eliminated, and high-quality input data is provided for subsequent modules; an object detection module for detecting an object from a visible light image And infrared imageThe targets in the two images are matched through cross-modal association, and target information is provided for subsequent feature extraction and fusion; The feature extraction module is used for extracting the modal complementary features of the targets in the visible light and infrared images, including shape, texture and edge information, and providing rich and robust feature representation for subsequent feature fusion; The feature level fusion module is used for aligning and fusing the features extracted from the visible light image and the infrared image to generate fusion features rich in information and complementary to each other,