US-20260127721-A1 - METHOD, APPARATUS, DEVICE, AND MEDIUM OF IMAGE PROCESSING BASED ON EVENT CAMERA

US20260127721A1US 20260127721 A1US20260127721 A1US 20260127721A1US-20260127721-A1

Abstract

A method, an apparatus, a device, and a medium of image processing based on an event camera. The method obtains event data from blurred image captured by the event camera; constructs an initial PINN model, and embeds an event generation equation into the initial PINN model; inputs the event data into the PINN model to obtain a predicted luminance change-gradient value; performs self-supervised optimization using a temporal derivative loss based on the predicted luminance change-gradient value and a ground-truth luminance change-gradient value, and introduces a Tikhonov regularization constraint condition to optimize the PINN model; inputs the event data corresponding to a to-be-processed image into the optimized PINN model to obtain luminance values in logarithmic domain of three color channels of the to-be-processed image; and uses tone mapping to convert the luminance values in the logarithmic domain into reconstructed image frames.

Inventors

Hui Xiong
Zipeng Wang
Yunfan Lu

Assignees

The Hong Kong University of Science and Technology (Guangzhou)

Dates

Publication Date: 20260507
Application Date: 20251023
Priority Date: 20241104

Claims (20)

1 . A method of image processing based on an event camera, comprising: obtaining event data corresponding to a blurred image captured by the event camera; constructing an initial Physics-Informed Neural Network (PINN) model, and embedding an event generation equation into the initial PINN model, wherein an input item of the initial PINN model is the event data, and an output item of the initial PINN model is a predicted luminance change-gradient value satisfying the event generation equation; inputting the event data corresponding to the blurred image into the initial PINN model to obtain the predicted luminance change-gradient value of the event data, performing self-supervised optimization using a temporal derivative loss based on the predicted luminance change-gradient value and a ground-truth luminance change-gradient value, introducing a Tikhonov regularization constraint condition to optimize the initial PINN model, and determining an optimized PINN model in response to a predictive performance of the PINN model meeting a predefined standard; inputting the event data corresponding to a to-be-processed image into the optimized PINN model to obtain luminance values in logarithmic domain of three color channels of the to-be-processed image; and using tone mapping to convert the luminance values in the logarithmic domain of the three color channels into reconstructed image frames.
2 . The method of image processing based on the event camera according to claim 1 , wherein the using tone mapping to convert the luminance values in the logarithmic domain of the three color channels into reconstructed image frames comprises: converting the luminance values in the logarithmic domain of the three color channels into high dynamic range (HDR) luminance values through an exponential function; adjusting luminance and contrast of the to-be-processed image through Reinhard tone-mapping function to convert the HDR luminance values into (LDR) luminance values; and generating the reconstructed image frames based on the LDR luminance values.
3 . The method of image processing based on the event camera according to claim 1 , wherein the inputting the event data corresponding to a to-be-processed image into the optimized PINN model to obtain luminance values in logarithmic domain of three color channels of the to-be-processed image comprises: inputting time coordinates of the event data into the optimized PINN model to obtain the luminance values in the logarithmic domain of a red channel, a green channel, and a blue channel.
4 . The method of image processing based on the event camera according to claim 1 , wherein hidden layers of the PINN model are multi-layer fully connected neural networks, and parameters of the initial PINN model are randomized.
5 . The method of image processing based on the event camera according to claim 1 , wherein the performing self-supervised optimization using a temporal derivative loss based on the predicted luminance change-gradient value and a ground-truth luminance change-gradient value comprises: calculating a luminance change-gradient difference between the predicted luminance change-gradient value output by the PINN model and the ground-truth luminance change-gradient value calculated by the event generation equation; and determining a mean square error loss function based on the luminance change-gradient difference, and performing the self-supervised optimization using the temporal derivative loss of the PINN model based on the mean square error loss function.
6 . The method of image processing based on the event camera according to claim 4 , wherein the introducing a Tikhonov regularization constraint condition to optimize the initial PINN model comprises: constraining a spatial gradient of a luminance in the logarithmic domain through the Tikhonov regularization, expressed as: L reg = ( ∂ F Θ ∂ x ) 2 + ( ∂ F Θ ∂ y ) 2 , wherein, L reg represents a regularization constraint condition, ∂ F Θ ∂ x represents a derivative of a trained PINN network F Θ in an x dimension, and ∂ F Θ ∂ y represents a derivative of the trained PINN network F Θ in a y dimension.
7 . The method of image processing based on the event camera according to claim 1 , wherein the event generation equation is expressed as: ∂ L ⁡ ( t 1 + t 2 2 ) ∂ t = 1 t 2 - t 1 ⁢ ∫ t 1 t 2 ∑ i P i ⁢ θ ⁢ δ ⁢ ( t - t i ) ⁢ dt , wherein, L represents a luminance change-gradient function, P i represents a direction of luminance change-gradient, θ represents a luminance change-gradient threshold that triggers an event, δ(t-t i ) represents a Dirac delta function, and t 1 and t 2 represent time coordinates.
8 . The method of image processing based on the event camera according to claim 2 , wherein the event generation equation is expressed as: ∂ L ⁡ ( t 1 + t 2 2 ) ∂ t = 1 t 2 - t 1 ⁢ ∫ t 1 t 2 ∑ i P i ⁢ θ ⁢ δ ⁢ ( t - t i ) ⁢ dt , wherein, L represents a luminance change-gradient function, P i represents a direction of luminance change-gradient, θ represents a luminance change-gradient threshold that triggers an event, δ(t-t i ) represents a Dirac delta function, and t 1 and t 2 represent time coordinates.
9 . The method of image processing based on the event camera according to claim 3 , wherein the event generation equation is expressed as: ∂ L ⁡ ( t 1 + t 2 2 ) ∂ t = 1 t 2 - t 1 ⁢ ∫ t 1 t 2 ∑ i P i ⁢ θ ⁢ δ ⁢ ( t - t i ) ⁢ dt , wherein, L represents a luminance change-gradient function, P i represents a direction of luminance change-gradient, θ represents a luminance change-gradient threshold that triggers an event, δ(t-t i ) represents a Dirac delta function, and t 1 and t 2 represent time coordinates.
10 . The method of image processing based on the event camera according to claim 4 , wherein the event generation equation is expressed as: ∂ L ⁡ ( t 1 + t 2 2 ) ∂ t = 1 t 2 - t 1 ⁢ ∫ t 1 t 2 ∑ i P i ⁢ θ ⁢ δ ⁢ ( t - t i ) ⁢ dt , wherein, L represents a luminance change-gradient function, P i represents a direction of luminance change-gradient, θ represents a luminance change-gradient threshold that triggers an event, δ(t-t i ) represents a Dirac delta function, and t 1 and t 2 represent time coordinates.
11 . The method of image processing based on the event camera according to claim 5 , wherein the event generation equation is expressed as: ∂ L ⁡ ( t 1 + t 2 2 ) ∂ t = 1 t 2 - t 1 ⁢ ∫ t 1 t 2 ∑ i P i ⁢ θ ⁢ δ ⁢ ( t - t i ) ⁢ dt , wherein, L represents a luminance change-gradient function, P i represents a direction of luminance change-gradient, θ represents a luminance change-gradient threshold that triggers an event, δ(t-t i ) represents a Dirac delta function, and t 1 and t 2 represent time coordinates.
12 . The method of image processing based on the event camera according to claim 6 , wherein the event generation equation is expressed as: ∂ L ⁡ ( t 1 + t 2 2 ) ∂ t = 1 t 2 - t 1 ⁢ ∫ t 1 t 2 ∑ i P i ⁢ θ ⁢ δ ⁢ ( t - t i ) ⁢ dt , wherein, L represents a luminance change-gradient function, P i represents a direction of luminance change-gradient, θ represents a luminance change-gradient threshold that triggers an event, δ(t-t i ) represents a Dirac delta function, and t 1 and t 2 represent time coordinates.
13 . An apparatus of image processing based on an event camera, configured to perform the method of image processing based on the event camera described in claim 1 , comprising: a data obtaining module, configured to obtain the event data corresponding to the blurred image captured by the event camera; a model construction module, configured to construct the initial PINN model, and embed the event generation equation into the initial PINN model, wherein the input item of the initial PINN model is the event data, and the output item of the initial PINN model is the predicted luminance change-gradient value satisfying the event generation equation; a model optimization module, configured to input the event data corresponding to the blurred image into the initial PINN model to obtain the predicted luminance change-gradient value of the event data, perform the self-supervised optimization using the temporal derivative loss based on the predicted luminance change-gradient value and the ground-truth luminance change-gradient value, introduce the Tikhonov regularization constraint condition to optimize the initial PINN model, and determine the optimized PINN model in response to the predictive performance of the PINN model meets the predefined standard; a luminance value prediction module, configured to input the event data corresponding to the to-be-processed image into the optimized PINN model to obtain the luminance values in the logarithmic domain of the three color channels of the to-be-processed image; and a tone mapping module, configured to use tone mapping to convert the luminance values in the logarithmic domain of the three color channels into the reconstructed image frames.
14 . An apparatus of image processing based on an event camera, configured to perform the method of image processing based on the event camera described in claim 2 , comprising: a data obtaining module, configured to obtain the event data corresponding to the blurred image captured by the event camera; a model construction module, configured to construct the initial PINN model, and embed the event generation equation into the initial PINN model, wherein the input item of the initial PINN model is the event data, and the output item of the initial PINN model is the predicted luminance change-gradient value satisfying the event generation equation; a model optimization module, configured to input the event data corresponding to the blurred image into the initial PINN model to obtain the predicted luminance change-gradient value of the event data, perform the self-supervised optimization using the temporal derivative loss based on the predicted luminance change-gradient value and the ground-truth luminance change-gradient value, introduce the Tikhonov regularization constraint condition to optimize the initial PINN model, and determine the optimized PINN model in response to the predictive performance of the PINN model meets the predefined standard; a luminance value prediction module, configured to input the event data corresponding to the to-be-processed image into the optimized PINN model to obtain the luminance values in the logarithmic domain of the three color channels of the to-be-processed image; and a tone mapping module, configured to use tone mapping to convert the luminance values in the logarithmic domain of the three color channels into the reconstructed image frames.
15 . An apparatus of image processing based on an event camera, configured to perform the method of image processing based on the event camera described in claim 3 , comprising: a data obtaining module, configured to obtain the event data corresponding to the blurred image captured by the event camera; a model construction module, configured to construct the initial PINN model, and embed the event generation equation into the initial PINN model, wherein the input item of the initial PINN model is the event data, and the output item of the initial PINN model is the predicted luminance change-gradient value satisfying the event generation equation; a model optimization module, configured to input the event data corresponding to the blurred image into the initial PINN model to obtain the predicted luminance change-gradient value of the event data, perform the self-supervised optimization using the temporal derivative loss based on the predicted luminance change-gradient value and the ground-truth luminance change-gradient value, introduce the Tikhonov regularization constraint condition to optimize the initial PINN model, and determine the optimized PINN model in response to the predictive performance of the PINN model meets the predefined standard; a luminance value prediction module, configured to input the event data corresponding to the to-be-processed image into the optimized PINN model to obtain the luminance values in the logarithmic domain of the three color channels of the to-be-processed image; and a tone mapping module, configured to use tone mapping to convert the luminance values in the logarithmic domain of the three color channels into the reconstructed image frames.
16 . An apparatus of image processing based on an event camera, configured to perform the method of image processing based on the event camera described in claim 5 , comprising: a data obtaining module, configured to obtain the event data corresponding to the blurred image captured by the event camera; a model construction module, configured to construct the initial PINN model, and embed the event generation equation into the initial PINN model, wherein the input item of the initial PINN model is the event data, and the output item of the initial PINN model is the predicted luminance change-gradient value satisfying the event generation equation; a model optimization module, configured to input the event data corresponding to the blurred image into the initial PINN model to obtain the predicted luminance change-gradient value of the event data, perform the self-supervised optimization using the temporal derivative loss based on the predicted luminance change-gradient value and the ground-truth luminance change-gradient value, introduce the Tikhonov regularization constraint condition to optimize the initial PINN model, and determine the optimized PINN model in response to the predictive performance of the PINN model meets the predefined standard; a luminance value prediction module, configured to input the event data corresponding to the to-be-processed image into the optimized PINN model to obtain the luminance values in the logarithmic domain of the three color channels of the to-be-processed image; and a tone mapping module, configured to use tone mapping to convert the luminance values in the logarithmic domain of the three color channels into the reconstructed image frames.
17 . An apparatus of image processing based on an event camera, configured to perform the method of image processing based on the event camera described in claim 6 , comprising: a data obtaining module, configured to obtain the event data corresponding to the blurred image captured by the event camera; a model construction module, configured to construct the initial PINN model, and embed the event generation equation into the initial PINN model, wherein the input item of the initial PINN model is the event data, and the output item of the initial PINN model is the predicted luminance change-gradient value satisfying the event generation equation; a model optimization module, configured to input the event data corresponding to the blurred image into the initial PINN model to obtain the predicted luminance change-gradient value of the event data, perform the self-supervised optimization using the temporal derivative loss based on the predicted luminance change-gradient value and the ground-truth luminance change-gradient value, introduce the Tikhonov regularization constraint condition to optimize the initial PINN model, and determine the optimized PINN model in response to the predictive performance of the PINN model meets the predefined standard; a luminance value prediction module, configured to input the event data corresponding to the to-be-processed image into the optimized PINN model to obtain the luminance values in the logarithmic domain of the three color channels of the to-be-processed image; and a tone mapping module, configured to use tone mapping to convert the luminance values in the logarithmic domain of the three color channels into the reconstructed image frames.
18 . An apparatus of image processing based on an event camera, configured to perform the method of image processing based on the event camera described in claim 7 , comprising: a data obtaining module, configured to obtain the event data corresponding to the blurred image captured by the event camera; a model construction module, configured to construct the initial PINN model, and embed the event generation equation into the initial PINN model, wherein the input item of the initial PINN model is the event data, and the output item of the initial PINN model is the predicted luminance change-gradient value satisfying the event generation equation; a model optimization module, configured to input the event data corresponding to the blurred image into the initial PINN model to obtain the predicted luminance change-gradient value of the event data, perform the self-supervised optimization using the temporal derivative loss based on the predicted luminance change-gradient value and the ground-truth luminance change-gradient value, introduce the Tikhonov regularization constraint condition to optimize the initial PINN model, and determine the optimized PINN model in response to the predictive performance of the PINN model meets the predefined standard; a luminance value prediction module, configured to input the event data corresponding to the to-be-processed image into the optimized PINN model to obtain the luminance values in the logarithmic domain of the three color channels of the to-be-processed image; and a tone mapping module, configured to use tone mapping to convert the luminance values in the logarithmic domain of the three color channels into the reconstructed image frames.
19 . A computer device, comprising a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to perform steps of the method according to claim 1 .
20 . A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, performs steps of the method according to claim 1 .

Description

CROSS-REFERENCE TO RELATED APPLICATIONS The present disclosure claims the benefit of Chinese patent application No. 202411560346.7 filed on Nov. 4, 2024, and entitled “Method, Apparatus, Device, and Medium of Image Processing Based on Event Camera”, which is hereby incorporated by reference in its entirety. TECHNICAL FIELD The present disclosure relates to the field of image processing, and in particular to a method, an apparatus, a device, and a medium of image processing based on an event camera. BACKGROUND With the continuous advancement of the technology of computer vision, traditional frame cameras encounter some challenges when processing dynamic scenes and high dynamic range (HDR) environments. To overcome these challenges, event cameras are developed as a new type of sensors, which are gradually concerned with the characteristics of low power consumption, high dynamic range, and high temporal resolution, and become a new hotspot in computer vision research. Unlike a traditional camera, the event camera can capture a brightness change in a scene at a temporal resolution of a microsecond level, generating sparse and asynchronous event stream data. Format of the event stream data differs significantly from that of traditional frame images, presenting a new challenge for information processing and algorithm design. Existing event stream processing methods primarily depend on supervised learning, which requires a large amount of labeled data to train models. However, synthetic training data often differs from real scenes, resulting in degraded algorithm performance in practical applications. Additionally, the existing event stream processing or traditional frame-based methods often lack robustness to noise when processing complex illumination conditions and high dynamic range scenes, thereby affecting the quality of reconstruction results. SUMMARY To this end, the present disclosure provides a method, an apparatus, a device, and a medium of image processing based on an event camera, which reduces dependence on labeled data, that is, high-quality reconstruction may be performed on a blurred image. According to a first aspect, the present disclosure provides a method of image processing based on an event camera. The present disclosure is implemented through following technical solution: A method of image processing based on an event camera, including: obtaining event data corresponding to a blurred image captured by the event camera;constructing an initial Physics-Informed Neural Network (PINN) model, and embedding an event generation equation into the initial PINN model, where an input item of the initial PINN model is the event data, and an output item of the initial PINN model is a predicted luminance change-gradient value satisfying the event generation equation;inputting the event data corresponding to the blurred image into the initial PINN model to obtain the predicted luminance change-gradient value of the event data, performing self-supervised optimization using a temporal derivative loss based on the predicted luminance change-gradient value and a ground-truth luminance change-gradient value, introducing a Tikhonov regularization constraint condition to optimize the initial PINN model, and determining an optimized PINN model in response to a predictive performance of the PINN model meeting a predefined standard;inputting the event data corresponding to a to-be-processed image into the optimized PINN model to obtain luminance values in logarithmic domain of three color channels of the to-be-processed image; andusing tone mapping to convert the luminance values in the logarithmic domain of the three color channels into reconstructed image frames. In a preferred example of the present disclosure, the using tone mapping to convert the luminance values in the logarithmic domain of the three color channels into reconstructed image frames includes: converting the luminance values in the logarithmic domain of the three color channels into high dynamic range (HDR) luminance values through an exponential function;adjusting luminance and contrast of the to-be-processed image through Reinhard tone-mapping function to convert the HDR luminance values into low dynamic range (LDR) luminance values; andgenerating the reconstructed image frames based on the LDR luminance values. In a preferred example of the present disclosure, the inputting the event data corresponding to a to-be-processed image into the optimized PINN model to obtain luminance values in logarithmic domain of three color channels of the to-be-processed image includes: inputting time coordinates of the event data into the optimized PINN model to obtain the luminance values in the logarithmic domain of a red channel, a green channel, and a blue channel. In a preferred example of the present disclosure, hidden layers of the PINN model are multi-layer fully connected neural networks, and parameters of the initial PINN model are randomized. In a pref