Search

CN-121998864-A - Event camera real-time deblurring method based on dynamic edge guidance

CN121998864ACN 121998864 ACN121998864 ACN 121998864ACN-121998864-A

Abstract

The invention provides a real-time deblurring method of an event camera based on dynamic edge guiding, which comprises the steps of obtaining a to-be-restored blurred image and event stream data under the same view field, extracting image mode characteristics and event mode characteristics by using a double-flow visual encoder, constructing a dynamic edge guiding module, predicting multi-level fuzzy probability distribution of pixel levels, generating a dynamic space edge prior diagram, removing redundant background events by using the edge prior diagram, screening out key events Token rich in motion information, constructing a cross-mode fusion module based on two directions RWKV, carrying out long-distance space-time modeling on the screened events Token and the image characteristics by using linear calculation complexity, and finally decoding and outputting clear images. According to the invention, through a guiding-selecting paradigm, the sensor noise is effectively restrained and the calculation redundancy is reduced under the condition of not depending on the traditional hard gating, so that the image deblurring with high signal-to-noise ratio, high structural similarity and meeting the real-time requirement is realized.

Inventors

  • SHAO WENZE
  • SHI FENG
  • LI LIUYI
  • ZHU JINJING
  • WANG JIAN

Assignees

  • 南京邮电大学

Dates

Publication Date
20260508
Application Date
20260121

Claims (8)

  1. 1. The real-time deblurring method of the event camera based on dynamic edge guidance is characterized by comprising the following steps of: Step 1, acquiring fuzzy image data and corresponding event data in a dynamic scene, acquiring a clear image synchronized with the fuzzy image data as a real tag, and constructing an event deblurring data set for training and evaluation; step 2, performing space-time coding on the original event data to convert the original event data into event feature tensors, and respectively extracting deep semantic features of the fuzzy image data and the event data by using a double-flow visual encoder to serve as input of subsequent cross-mode interaction; step 3, constructing a dynamic edge guiding deblurring network based on RWKV architecture, wherein the network comprises a dynamic edge guiding module, an edge guiding Token selecting module and a bidirectional RWKV cross-mode fusion module; Step 4, dividing the preprocessed data set into a training set, a verification set and a test set, and performing end-to-end counter propagation training on the model by using the training set, wherein a dynamic edge guiding module is responsible for predicting fuzzy probability distribution of a pixel level and synthesizing a dynamic edge map, an edge guiding Token selecting module eliminates redundant background events according to the edge map, a bidirectional RWKV cross-modal fusion module performs long-distance space-time modeling on the screened key events Token and image features by linear complexity, and a gradient descent algorithm is used for minimizing a pixel level reconstruction loss function to optimize model parameters; and step 5, after training is completed, selecting the model weight with the optimal performance on the verification set, and reasoning the real-time fuzzy image to be processed and the event stream to generate a high-fidelity clear image.
  2. 2. The method for real-time deblurring of an event camera based on dynamic edge guiding as set forth in claim 1, wherein step 2 comprises the steps of dividing event data in exposure time into T equal-time sub-intervals, performing linear interpolation embedding, dividing the time-interval into Patches with fixed size, flattening the Patches into Token sequences, capturing long-distance dependence by using a linear attention mechanism in RWKV blocks by using double-flow Vision-RWKV as a backbone network, and outputting downsampled image latent features respectively And event latent features Then remodelling the two Token sequences of the blurred image data and the event data into a spatial signature, expressed as And Wherein In order to provide the number of channels, And Is the downsampled resolution.
  3. 3. The method for real-time deblurring of an event camera based on dynamic edge guiding as claimed in claim 2, wherein the dynamic edge guiding module is constructed by first characterizing the image With event feature Splicing in the channel dimension, and integrating cross-modal information through a1 multiplied by 1 convolution layer to obtain fusion characteristics The calculation formula is as follows: ; Wherein the method comprises the steps of The channel splice is indicated as such, Representing a1 x 1 convolutional layer; the edge estimation is then modeled as a probabilistic blur level prediction task using the projection layer For fusion features Mapping is carried out, and Logit response values of five different fuzzy grades are predicted: ; for each pixel point on the feature map Calculate it by Softmax operation to belong to Probability of individual blur level : ; Thereby obtaining a pixel level probability distribution map reflecting different blurring degrees; finally, weighting and summing all levels of fuzzy probability obtained by prediction to generate a final dynamic space edge map : ; Wherein, the Is a predefined fuzzy intensity weight coefficient, and the obtained As a spatial probability prior, in The areas with lower values correspond to static background or weak blurring, in Areas of higher values correspond to severe motion blur and object boundaries.
  4. 4. The method for real-time deblurring of an event camera based on dynamic edge guidance according to claim 3, wherein the edge guidance Token selection module has the following specific structure: first, calculate the event Token sequence Is of the self-similarity matrix of (a) Line average Quantifying the internal redundancy; dynamic spatial edge prior map is then used Flattening to token level prior And modulate the similarity score Wherein As a spatial suppression factor, reducing redundancy score in a high-ambiguity probability region; Then select redundancy score Lowest front The location retains the corresponding event feature vector: ; Wherein, the , In order to preserve the ratio of the components, The number of the event Token is the number, and the feature vector of the event after screening is recorded as 。
  5. 5. The method for real-time deblurring of an event camera based on dynamic edge guidance of claim 4, wherein the bi-directional RWKV cross-modal fusion module has a specific structure that the input is an image feature And key event features after screening First, cascading along sequence dimension to form unified sequence Wherein For the number of image Token, To preserve event Token number, then by cross-modality RWKV block processing: ; ; Wherein the method comprises the steps of In order to query the offset operation, Is bidirectional The operator is used to determine the position of the operator, In order to activate the function, And Is a weight matrix, and finally outputs a characteristic diagram: ; For high quality image reconstruction of subsequent decoders.
  6. 6. The method for real-time deblurring of an event camera based on dynamic edge guide according to claim 5, wherein the end-to-end back propagation training of the network model is performed by using an optimization algorithm based on gradient descent, and the training process updates the network parameters by minimizing the pixel-level reconstruction loss function, which is specifically as follows: ; ; in the formula, Representing a deblurred image predicted by the network, Representing the corresponding true sharp image label, The L1 distance is represented for constraining the consistency of the predicted image with the true sharp image in pixel space.
  7. 7. A computer readable storage medium, on which a computer program is stored, which when being executed by a processor implements the steps of a dynamic edge-guided based event camera real-time deblurring method according to any of claims 1-6.
  8. 8. A computer device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of a dynamic edge-guided based event camera real-time deblurring method according to any of claims 1-6 when the program is executed by the processor.

Description

Event camera real-time deblurring method based on dynamic edge guidance Technical Field The invention relates to the field of computer vision and computational photography, in particular to a real-time deblurring method of an event camera based on dynamic edge guidance. Background Image deblurring is used as a core task in the fields of computer vision and computational photography, and aims to restore clear images through reverse modeling of motion degradation processes, so that key bottom layer perception support is provided for applications such as automatic driving, unmanned aerial vehicle navigation, mobile intelligent terminal imaging and the like. In recent years, a deblurring method based on deep learning learns a mapping relation between a blurred image and a clear image by constructing an encoder-decoder architecture, so that remarkable progress is made in coping with a complex dynamic scene, however, a traditional frame camera is limited by an inherent photon integration mechanism and a fixed sampling frequency, if a scene object or the camera moves relatively in an exposure time, high-frequency detail loss, texture boundary degradation and visual artifacts are caused, and the upper performance limit is caused by the defect of physical motion clues in a single frame blurred image. In order to break through the perception limitation of the traditional camera, the event camera is introduced into a deblurring task, and the sensor has microsecond time resolution and extremely high dynamic range and can asynchronously record brightness change events, so that the possibility is provided for capturing high-frequency motion tracks. Current mainstream methods typically encode event streams into voxel grids and employ cross-modal attention mechanisms to fuse event information with image features to achieve restoration. Despite the advances made by existing methods, three key challenges remain. Firstly, event data show extremely high sparsity in space and are easy to be interfered by thermal noise of a sensor, and particularly, massive redundant events generated in a complex texture area not only increase calculation cost, but also introduce unsteady noise in a fusion process, so that artifact occurs in a restoration result. Secondly, the existing scheme depends on a global self-attention or cross-attention mechanism (such as a architecture based on a transducer) with extremely high computational complexity, the calculated amount of the scheme increases in square level along with the input data amount, and the real-time processing requirement of embedded equipment is difficult to meet when high-resolution or long-sequence data is processed. Thirdly, the existing method generally lacks an effective structure guiding mechanism, all events are often subjected to indiscriminate processing, spatial non-uniformity of image blurring strength is ignored, a motion boundary area which is critical to deblurring cannot be focused precisely on a semantic level, and structural blurring still exists in a restored image at an object contour. In short, the existing method has obvious defects in terms of space sparsity selection and real-time space-time modeling capability of modeling event streams, and is difficult to cope with massive redundant data interference and real-time bottleneck. The limitations disclose how to construct a real-time deblurring framework that can adaptively screen motion key cues and also can model long Cheng Shikong with linear computational complexity to achieve a balance of high fidelity and high efficiency. Disclosure of Invention The invention aims to overcome the defects of the prior art, and provides a real-time deblurring method of an event camera based on dynamic edge guiding, which is used for guiding a model to adaptively screen key motion information in a space-time dimension by constructing a space structure priori based on a fuzzy probability and realizing efficient feature fusion by combining a cross-modal linear recursion scanning mechanism, thereby remarkably improving the instantaneity and reconstruction precision of event-assisted deblurring in a complex high-speed motion scene so as to solve the problems in the prior art. The technical scheme provided by the invention is as follows: A real-time deblurring method of an event camera based on dynamic edge guidance comprises the following steps: Step 1, acquiring fuzzy image data and corresponding event data in a dynamic scene, acquiring a clear image synchronized with the fuzzy image data as a real tag, and constructing an event deblurring data set for training and evaluation; step 2, performing space-time coding on the original event data to convert the original event data into event feature tensors, and respectively extracting deep semantic features of the fuzzy image data and the event data by using a double-flow visual encoder to serve as input of subsequent cross-mode interaction; step 3, constructing a dynamic edge guiding deb