CN-121998839-A - Image processing method and vehicle

CN121998839ACN 121998839 ACN121998839 ACN 121998839ACN-121998839-A

Abstract

The application provides an image processing method and a vehicle, and relates to the technical field of image processing, wherein the image processing method comprises the steps of inputting a group of infrared and visible light images synchronously collected in a low-illumination environment into an image enhancement model, and firstly, carrying out multi-scale target detection to obtain a detection result; registering the two images based on the detection result to obtain a registered infrared image, then respectively carrying out detail enhancement on the visible light and the registered infrared image, fusing the enhanced infrared and visible light images to generate a pseudo gray image, fusing semantic features of original infrared and visible light images of the images, fusing process features generated in the process of generating the pseudo gray image and image detail features of the visible light image and the registered infrared image to form a fused feature image, and finally combining the fused feature image with the enhanced infrared and visible light images to generate and output a color environment image. The application can improve the generation quality of the driving image in the low-illumination scene.

Inventors

LI MING

Assignees

长城汽车股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260115

Claims (10)

1. An image processing method, characterized in that the image processing method comprises: Under the condition that a vehicle is located in a low-illumination scene, performing pixel level alignment on an infrared image of a target environment area shot by an infrared camera and a visible light image of the target environment area shot by a visible light camera to obtain an image pair to be processed, and inputting the image pair to be processed into an image enhancement model; Performing multi-scale target detection on the image pair to be processed to obtain a target detection result, wherein the target detection result comprises the boundary frame coordinates of the target; performing image registration on the infrared image and the visible light image based on the boundary frame coordinates of the target to obtain a registered infrared image; performing image detail enhancement on the visible light image and the registration infrared image to obtain an enhanced visible light image and an enhanced infrared image; fusing the enhanced visible light image and the enhanced infrared image to obtain a pseudo gray image of the target environment area; Fusing the first feature to the third feature to obtain a fused feature map, wherein the first feature comprises semantic features obtained based on the image pair to be processed, the second feature comprises fused features generated in the process of generating the pseudo gray image based on the enhanced visible light image and the enhanced infrared image, and the third feature comprises image detail features obtained based on the visible light image and the registered infrared image; Generating a color environment image of the target environment region based on the fused feature map, the enhanced visible light image, and the enhanced infrared image; outputting the color environment image by the image enhancement model.
2. The image processing method according to claim 1, wherein the image enhancement model includes an object detection network including a shared main network, a feature pyramid network, two hole convolution layers, two detection heads, and a fusion layer; the step of carrying out multi-scale target detection on the image pair to be processed, and obtaining a target detection result comprises the following steps: inputting the infrared image and the visible light image into the shared main network to obtain a first feature group corresponding to the infrared image and a second feature group corresponding to the visible light image, wherein the first feature group comprises feature images output by the shared main network in the 3 rd to 5 th residual error stages based on the infrared image, and the second feature group comprises feature images output by the shared main network in the 3 rd to 5 th residual error stages based on the visible light image; Respectively inputting the first feature group and the second feature group into the feature pyramid network to obtain a first scale feature set and a second scale feature set, wherein the first scale feature set comprises a multi-scale feature map generated by the feature pyramid network based on the first feature group, and the second scale feature set comprises a multi-scale feature map generated by the feature pyramid network based on the second feature group; performing receptive field enhancement on the feature images in the first scale feature set by adopting a first cavity convolution layer to obtain a third scale feature set; performing receptive field enhancement on the feature images in the second scale feature set by adopting a second cavity convolution layer to obtain a fourth scale feature set; Inputting the third scale feature set into a first detection head to obtain a first detection result corresponding to the infrared image; inputting the fourth scale feature set into a second detection head to obtain a second detection result corresponding to the visible light image; and fusing the first detection result and the second detection result by adopting the fusion layer to obtain the target detection result.
3. The image processing method according to claim 1, wherein the image registering the infrared image and the visible light image based on the bounding box coordinates of the object, the obtaining a registered infrared image includes: According to the boundary frame coordinates of the target and the geometric projection relationship between the infrared image and the visible light image, projecting the infrared image to a coordinate system where the visible light image is positioned to obtain a coarse registration infrared image; extracting multi-resolution spatial features of the coarse registration infrared image and multi-resolution spatial features of the visible light image; determining multi-resolution spatial features of the coarse registration infrared image and multi-resolution spatial features of the visible light image in a unified feature space to obtain an initial optical flow field; Enhancing the initial optical flow field based on an attention guiding mechanism to obtain an enhanced optical flow field; and reversely sampling the infrared image based on the enhanced optical flow field so as to project the infrared image to a coordinate system where the visible light image is located, so as to obtain a fine registration infrared image, and determining the fine registration infrared image as the registration infrared image.
4. The image processing method according to claim 1, wherein the image enhancement model comprises an image enhancement network comprising two encoders, a central connection layer and two decoders; the step of enhancing the image details of the visible light image and the registration infrared image to obtain an enhanced visible light image and an enhanced infrared image comprises the following steps: Extracting a first thermal structural feature of the registered infrared image using a first encoder; extracting a first texture feature of the visible light image by using a second encoder; The central connecting layer is adopted to carry out channel splicing and semantic fusion on the first thermal structural features and the first texture structural features, so that fusion structural features are obtained; Adopting a channel attention mechanism of a first decoder to enhance a hot area in the fusion structural feature to obtain the enhanced infrared image; And enhancing the texture region in the fusion structural feature by adopting a channel attention mechanism of a second decoder to obtain the enhanced visible light image.
5. The image processing method of claim 1, wherein the image enhancement model comprises an image fusion generation countermeasure network comprising two encoders, a self-attention fusion module, a decoder, and a arbiter; the fusing the enhanced visible light image and the enhanced infrared image to obtain a pseudo gray image of the target environment area comprises the following steps: extracting a second thermal structural feature of the enhanced infrared image using a first encoder; extracting a second texture feature of the enhanced visible light image using a second encoder; Determining cross-modal spatial correlation between the second thermal structural feature and the second texture structural feature by adopting the self-attention fusion module to obtain two spatial weight diagrams, and carrying out weighted fusion on the two spatial weight diagrams to obtain a fusion weight diagram; the target area, the edge area and the texture area in the fusion weight map are enhanced by adopting a channel attention mechanism of the decoder, so that a single-channel pseudo gray scale fusion image is obtained; Comparing the single-channel pseudo gray scale fusion image with a reference pseudo gray scale image by adopting the discriminator, and outputting the single-channel pseudo gray scale fusion image and the authenticity score of each local area in the single-channel pseudo gray scale fusion image; And determining the single-channel pseudo gray scale fusion image as the pseudo gray scale image.
6. The image processing method according to claim 1, wherein fusing the first to third features to obtain a fused feature map includes: aligning the space size and the channel number of the first to third features to obtain aligned feature tensors corresponding to the first to third features respectively; Retaining different levels of feature information in the aligned feature tensors corresponding to the first to third features based on a feature pyramid fusion strategy so as to capture multi-level feature representation to obtain the first to third retained features; adjusting the fusion weights of the first to third retention features in the channel and space dimensions based on an attention adjustment mechanism, the first to third retention features after weight adjustment; Connecting the first to third reserved characteristics subjected to the enhancement weight adjustment by adopting a cross-path residual error to obtain first to third enhancement characteristics; and fusing the first to third enhancement feature rows to obtain the fused feature map.
7. The image processing method of claim 1, wherein the generating a color environment image of the target environment region based on the fusion feature map, the enhanced visible light image, and the enhanced infrared image comprises: Acquiring image detail features in the enhanced visible light image and thermal features in the enhanced infrared image; Fusing the image detail features, the thermal features and the fusion feature map to obtain a depth fusion feature map; Mapping the gray values of the depth fusion feature map into different color gradation areas in a predefined color space to obtain an initial three-primary-color image; and performing color adjustment on the initial three-primary-color image to obtain the color environment image.
8. The image processing method according to any one of claims 1 to 7, characterized in that after the color environment image is output by the image enhancement model, the image processing method further comprises at least one of the following steps: Generating a driving image according to the color environment image; Generating a panoramic image according to the color environment image control; and controlling the vehicle to execute auxiliary driving operation according to the color environment image.
9. The image processing method according to claim 1, characterized in that the image processing method further comprises: inputting a multi-modal image pair into an image enhancement model, the multi-modal image pair comprising a sample infrared image and a sample visible light image synchronously acquired in a low-illumination environment and for the same environmental area; Performing multi-scale target detection on the multi-mode image pair to obtain a sample target detection result, wherein the sample target detection result comprises a boundary frame coordinate and a target category of a target; Performing image registration on the sample infrared image and the sample visible light image based on the boundary frame coordinates of the target in the sample target detection result to obtain a registered sample infrared image; performing image detail enhancement on the sample visible light image and the sample registration infrared image to obtain an enhanced sample visible light image and an enhanced sample infrared image; Fusing the enhanced sample visible light image and the enhanced sample infrared image to obtain a sample pseudo gray image of the environment area; Fusing the first sample feature to the third sample feature to obtain a fused sample feature map, wherein the first sample feature comprises semantic features obtained based on the multi-mode image pair, the second sample feature comprises fusion features generated in the process of generating the sample pseudo gray scale image based on the enhanced sample visible light image and the enhanced sample infrared image, and the third sample feature comprises image detail features obtained based on the sample visible light image and the registered sample infrared image; Generating a sample pseudo-color image of the environmental area based on the fused sample feature map, the enhanced sample visible light image, and the enhanced sample infrared image; Training the image enhancement model based on first through fourth losses, wherein the first loss comprises losses of a bounding box coordinate and a true bounding box coordinate label of a target in the sample target detection result, and losses of the target class and the true class label, the second loss comprises losses of the enhanced sample visible light image and a reference enhanced visible light image, and losses of the enhanced sample infrared image and a reference enhanced infrared image, the third loss comprises losses of the sample pseudo-color image and the reference pseudo-color image, and the fourth loss comprises losses of the enhanced sample visible light image and the enhanced sample infrared image.
10. A vehicle, characterized in that the vehicle comprises: A memory for storing executable program code; A processor for calling and executing the executable program code from the memory, so that the vehicle executes the image processing method according to any one of claims 1 to 9.

Description

Image processing method and vehicle Technical Field The present application relates to the field of image processing technology, and more particularly, to an image processing method and a vehicle in the field of image processing technology. Background After the vehicle is provided with the driving image system, the surrounding environment can be presented in real time and intuitively, and the user is effectively assisted in sensing road conditions, identifying obstacles and making safe driving decisions. However, in low-illumination scenes such as night and tunnels, the conventional vehicle-mounted camera based on visible light is seriously insufficient in illumination, and the acquired images generally have the problems of low brightness, poor signal-to-noise ratio, blurred edges and the like, so that the quality of driving images is obviously reduced, and the accurate judgment of a driver on the surrounding environment is difficult to support. Therefore, how to improve the quality of the driving image generated in the low-illumination scene is a technical problem to be solved. Disclosure of Invention The embodiment of the application provides an image processing method and a vehicle, which can improve the generation quality of a driving image in a low-illumination scene by fusing and pseudo-color reconstructing multi-mode images in the same low-illumination scene, thereby improving the accuracy of target identification and the environment perception capability in the low-illumination scene. According to the image processing method, when a vehicle is located in a low-illumination scene, an infrared image of a target environment area is shot by an infrared camera and a visible light image of the target environment area is shot by a visible light camera, pixel level alignment is conducted on the infrared image and the visible light image to be processed, an image pair to be processed is obtained, an input image enhancement model is conducted on the image pair to be processed, multi-scale target detection is conducted on the image pair to be processed, a target detection result is obtained, the target detection result comprises boundary frame coordinates of a target, image registration is conducted on the infrared image and the visible light image based on the boundary frame coordinates of the target, a registration infrared image is obtained, image detail enhancement is conducted on the visible light image and the registration infrared image, an enhanced visible light image and an enhanced infrared image are obtained, the enhanced visible light image and the enhanced infrared image are fused, a pseudo gray image of the target environment area is obtained, fusion feature images are obtained by fusing first to third features, the first features comprise semantic features obtained based on the image pair to be processed, the second features comprise features produced in the process of generating the pseudo gray images based on the enhanced visible light image and the enhanced infrared image, the third features comprise the infrared features produced in the process based on the enhanced visible light image and the enhanced infrared image, and the color image is obtained based on the color enhancement image. The image processing method includes the steps of firstly carrying out pixel level alignment on an infrared image and a visible light image of a target environment area shot synchronously by the infrared camera and the visible light camera when a vehicle is in a low-illumination scene, generating an image pair to be processed, inputting the image pair to be processed into an image enhancement model, then carrying out multi-scale target detection on the image pair by the image enhancement model to obtain a detection result containing target boundary frame coordinates, carrying out accurate image registration on the infrared image and the visible light image based on the boundary frame to obtain a registered infrared image, then carrying out image detail enhancement on the visible light image and the registered infrared image respectively to generate an enhanced visible light image and an enhanced infrared image, and further fusing the enhanced visible light image and the enhanced infrared image to obtain a pseudo gray image of the target environment area, and further extracting three key features including semantic features generated by the image pair to be processed and fusion features generated in the pseudo gray image generation process, carrying out depth fusion on the three features extracted by the visible light image and the registered infrared image, forming a fusion feature image with high characterization capability, and finally combining the visible light image, the enhanced visible light image and the enhanced infrared image with the target environment area. The image processing method provided by the application effectively fuses the advantages of infrared and visible light modes, not only remarka