CN-118590780-B - Image processing method, apparatus, device, storage medium, and computer program product

CN118590780BCN 118590780 BCN118590780 BCN 118590780BCN-118590780-B

Abstract

The application discloses an image processing method, an image processing device, an image processing apparatus, a storage medium and a computer program product. The method comprises the steps of obtaining a rolling shutter RS frame collected by an event camera and event data corresponding to the RS frame, encoding the RS frame and the event data through an encoding unit of an image processing model to obtain space-time implicit expression STR data corresponding to the RS frame, embedding exposure time of a global shutter GS frame corresponding to the RS frame into the STR data through a time embedding unit of the image processing model to obtain a time tensor corresponding to the RS frame, and decoding the time tensor corresponding to the RS frame and the STR data corresponding to the RS frame pixel by pixel through a decoding unit of the image processing model to generate the GS frame. The scheme provided by the application can improve the definition of the GS frame recovered from the RS frame.

Inventors

XIONG HUI
LU YUNFAN
LIANG GUOQIANG

Assignees

香港科技大学(广州)

Dates

Publication Date: 20260508
Application Date: 20240530

Claims (13)

1. An image processing method, comprising: Acquiring an RS frame of a rolling shutter acquired by an event camera and event data corresponding to the RS frame; encoding the RS frame and the event data through an encoding unit of an image processing model to obtain space-time implicit expression STR data corresponding to the RS frame; Embedding the exposure time of the global shutter GS frame corresponding to the RS frame into the STR data through a time embedding unit of the image processing model to obtain a time tensor corresponding to the RS frame; Performing pixel-by-pixel decoding on the time tensor corresponding to the RS frame and the STR data corresponding to the RS frame through a decoding unit of the image processing model to generate the GS frame; The method comprises the steps of embedding exposure time of a global shutter GS frame corresponding to an RS frame into STR data through a time embedding unit of an image processing model, obtaining a time tensor corresponding to the RS frame, and embedding the exposure time of the GS frame into the STR data according to the mapping relation, wherein the time is used for representing the exposure start time of the RS frame, the exposure time corresponding to the GS frame is determined by the frame rate of the GS frame expected by a user, and the time tensor corresponding to the RS frame is obtained.
2. The method of claim 1, wherein encoding the RS frame and the event data by an encoding unit of an image processing model to obtain the spatio-temporal implicit representation STR data corresponding to the RS frame comprises: extracting features of the RS frame and the event data through the coding unit to obtain spatial features and temporal features corresponding to the RS frame; and sparse learning is carried out on the spatial features and the temporal features corresponding to the RS frames through the coding unit, so that STR data corresponding to the RS frames are obtained.
3. The method according to claim 2, wherein the encoding unit includes a convolutional layer of a convolutional neural network, and the feature extraction is performed on the RS frame and the event data by the encoding unit, so as to obtain a spatial feature and a temporal feature corresponding to the RS frame, including: And extracting features of the RS frame and the event data through the convolution layer to obtain spatial features and temporal features corresponding to the RS frame.
4. The method according to claim 2, wherein the encoding unit includes a neural network, and the feature extraction is performed on the RS frame and the event data by the encoding unit to obtain spatial features and temporal features corresponding to the RS frame, including: and extracting features of the RS frame and the event data through the graph neural network to obtain spatial features and temporal features corresponding to the RS frame.
5. The method of claim 1, wherein the time embedding unit includes a multi-layer perceptron, wherein embedding the exposure time of the GS frame into the STR data according to the mapping relationship, to obtain the time tensor corresponding to the RS frame, includes: Determining an embedding position corresponding to the exposure time of the GS frame through the mapping relation; and embedding the exposure time of the GS frame into the embedding position through the multi-layer perceptron to obtain a time tensor corresponding to the RS frame.
6. The method of claim 1, wherein embedding the exposure time of the GS frame into the STR data according to the mapping relationship to obtain a time tensor corresponding to the RS frame comprises: acquiring a relation between the time stamp and an exposure time embedding strategy, and determining a target embedding strategy corresponding to the time stamp; determining an embedded position corresponding to the exposure time through the mapping relation; And embedding the exposure time into the embedded position of the STR data through the target embedding strategy to obtain a time tensor corresponding to the RS frame.
7. The method of claim 1, wherein the decoding unit includes a plurality of multi-layer perceptrons, wherein generating the GS frame by pixel-wise decoding the temporal tensor corresponding to the RS frame and the STR data corresponding to the RS frame by the decoding unit of the image processing model includes: And decoding the time tensor corresponding to the RS frame and the STR data corresponding to the RS frame pixel by pixel through the plurality of multi-layer perceptrons to obtain the GS frame corresponding to each exposure time of the RS frame.
8. The method of claim 1, wherein the decoding unit includes generating at least one of an countermeasure network, a spatial transform network, and a deformation convolution network, wherein generating the GS frame by pixel-wise decoding, by the decoding unit of the image processing model, the temporal tensor corresponding to the RS frame and the STR data corresponding to the RS frame includes: And decoding the time tensor corresponding to the RS frame and the STR data corresponding to the RS frame pixel by pixel through at least one of the generating countermeasure network, the space transformation network and the deformation convolution network to obtain the GS frame corresponding to each exposure time of the RS frame.
9. The method according to any one of claims 1 to 8, wherein the image processing model is trained by: acquiring RS frame sample data and a first RS frame corresponding to the RS frame sample data, wherein the RS frame sample data at least comprises the RS frame and event data corresponding to the RS frame, and the definition of the first RS frame is higher than that of the RS frame sample data; Inputting the RS frame sample data into an initial image processing model, and obtaining a GS frame output by the initial image processing model; Performing progressive combination based on a plurality of continuous GS frames to obtain a second RS frame; Performing feature comparison on the first RS frame and the second RS frame to obtain feature difference data; Calculating a loss value of a loss function of the initial image processing model through the characteristic difference data; And adjusting model parameters of the initial image processing model according to the loss value until the loss function converges to obtain the image processing model.
10. An image processing apparatus, comprising: The data acquisition module is used for acquiring the rolling shutter RS frame acquired by the event camera and event data corresponding to the RS frame; the coding module is used for coding the RS frame and the event data through a coding unit of an image processing model to obtain space-time implicit expression STR data corresponding to the RS frame; The data embedding module is used for embedding the exposure time of the global shutter GS frame corresponding to the RS frame into the STR data through the time embedding unit of the image processing model to obtain a time tensor corresponding to the RS frame; The decoding module is used for decoding the time tensor corresponding to the RS frame and the STR data corresponding to the RS frame pixel by pixel through a decoding unit of the image processing model to generate the GS frame; The data embedding module comprises a time information acquisition module and a time embedding module, wherein the time information acquisition module is used for acquiring exposure time corresponding to the GS frame and a time stamp corresponding to the RS frame, the time stamp is used for representing the starting time of the exposure of the RS frame, the exposure time corresponding to the GS frame is determined by the frame rate of the GS frame expected by a user, the mapping module is used for constructing a mapping relation between the exposure time of the GS frame and the time stamp of the RS frame through the time embedding unit, and the time embedding module is used for embedding the exposure time of the GS frame into the STR data according to the mapping relation to obtain a time tensor corresponding to the RS frame.
11. An electronic device comprising a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the image processing method according to any one of claims 1-9.
12. A computer-readable storage medium, having stored thereon computer program instructions which, when executed by a processor, implement the image processing method according to any of claims 1-9.
13. A computer program product, characterized in that instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the image processing method according to any of claims 1-9.

Description

Image processing method, apparatus, device, storage medium, and computer program product Technical Field The present application relates to the field of computer vision, and in particular, to an image processing method, apparatus, device, storage medium, and computer program product. Background In the related art, consumer cameras based on CMOS (Complementary Metal Oxide Semiconductor ) sensors typically employ an RS (Rolling Shutter, rolling shutter) mechanism. However, in fast motion scenes, frames captured by consumer cameras based on CMOS sensors typically suffer from RS distortion and blurring. To improve the sharpness of the image captured by the camera, a GS (Global Shutter ) frame with high frame rate sharpness needs to be recovered from the blurred RS frame. Clear GS frames are typically recovered from blurred RS frames by correcting, deblurring, and frame interpolation of the RS frames. However, in the related art, correction, deblurring, and frame interpolation of an RS frame are generally handled as independent tasks and are cascade-processed through an existing image enhancement network. This approach increases the accumulated error and significant artifacts, thereby reducing the sharpness of the recovered GS frame. Disclosure of Invention Embodiments of the present application provide an image processing method, apparatus, device, storage medium, and computer program product, capable of improving the sharpness of a GS frame recovered from an RS frame. In a first aspect, an embodiment of the present application provides an image processing method, where the method includes acquiring a rolling shutter RS frame acquired by an event camera and event data corresponding to the RS frame, encoding the RS frame and the event data by an encoding unit of an image processing model to obtain space-time implicit expression STR data corresponding to the RS frame, embedding exposure time of a global shutter GS frame corresponding to the RS frame into the STR data by a time embedding unit of the image processing model to obtain a time tensor corresponding to the RS frame, and performing pixel-by-pixel decoding on the time tensor corresponding to the RS frame and the STR data corresponding to the RS frame by a decoding unit of the image processing model to generate a GS frame. In a second aspect, an embodiment of the present application provides an image processing apparatus, where the apparatus includes a data acquisition module configured to acquire a rolling shutter RS frame acquired by an event camera and event data corresponding to the RS frame, an encoding module configured to encode the RS frame and the event data by using an encoding unit of an image processing model to obtain space-time implicit expression STR data corresponding to the RS frame, a data embedding module configured to embed, by using a time embedding unit of the image processing model, an exposure time of a global shutter GS frame corresponding to the RS frame into the STR data to obtain a time tensor corresponding to the RS frame, and a decoding module configured to decode, by using a decoding unit of the image processing model, the time tensor corresponding to the RS frame and the STR data corresponding to the RS frame, pixel by pixel, to generate the GS frame. In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory storing computer program instructions, and where the processor implements the image processing method according to the first aspect when executing the computer program instructions. In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the image processing method according to the first aspect. In a fifth aspect, embodiments of the present application provide a computer program product, instructions in which, when executed by a processor of an electronic device, cause the electronic device to perform the image processing method according to the first aspect. As can be seen from the above, in the embodiment of the present application, the image processing model is used to perform space-time implicit encoding, exposure time embedding and pixel-by-pixel decoding on the blurred RS frame, so that the clear GS frame is recovered from the blurred RS frame, the image processing model integrates three tasks of RS frame correction, deblurring and frame interpolation in the RS frame processing process, and the three tasks are integrated into one body, so that the accumulated errors and artifacts caused by the RS frame processing by the three tasks are reduced, thereby improving the definition of the recovered GS frame and improving the quality of the GS frame. In addition, in the embodiment of the application, the space-time implicit expression is carried out on the RS frame, the comprehensive sp