CN-121999089-A - Self-calibration lightweight event camera image reconstruction method and system

CN121999089ACN 121999089 ACN121999089 ACN 121999089ACN-121999089-A

Abstract

The invention provides a self-calibration lightweight event camera image reconstruction method and system, and relates to the technical field of computer vision. The method comprises the steps of obtaining an event stream by using an event camera, preprocessing the event stream into a voxel grid, inputting the voxel grid into a lightweight neural network, extracting shallow features by using a head layer, inputting the shallow features into an encoder, extracting space-time features, capturing local structure correlation and global space-time dependence from the space-time features by using a self-calibration transducer, up-sampling the local structure correlation and the global space-time dependence through a decoder to obtain a feature map with increased spatial resolution, receiving the shallow features obtained by the head layer by using a double-domain spatial spectrum modulation module, fusing frequency domain and space domain information, outputting spectrum space enhancement features with the same dimension, fusing the spectrum space enhancement features and the feature map, and outputting a reconstruction intensity map by using a prediction layer. The invention reduces the model scale and the calculation complexity and realizes high-level reconstruction precision.

Inventors

ZHANG JIACHAO
Gao Yexiang
ZHAO XUEYI
ZHU YANG
WANG CHUANJUN
SHU XIANGBO

Assignees

南京工程学院
中电科联海创智信息科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260212

Claims (10)

1. The self-calibration lightweight event camera image reconstruction method is characterized by comprising the following steps of: Acquiring an event stream by using an event camera and preprocessing the event stream into a voxel grid; the method comprises the steps of constructing a lightweight neural network, wherein the lightweight neural network comprises a head layer, an encoder, a self-calibration transducer module, a double-domain spatial spectrum modulation module, a decoder and a prediction layer; Inputting the voxel grid into the lightweight neural network, extracting shallow features by using a head layer, and inputting the shallow features into an encoder to extract space-time features; capturing local structural correlation and global space-time dependency relationship from the space-time characteristics by utilizing a self-calibration transducer module; the local structure correlation and the global space-time dependency relationship are up-sampled by a decoder to obtain a feature map with increased spatial resolution; Receiving shallow layer characteristics obtained by a head layer through a double-domain spatial spectrum modulation module, fusing frequency domain and airspace information, and outputting spectrum space enhancement characteristics with the same dimension; and fusing the spectrum space enhancement features and the feature map, and outputting a reconstructed intensity map through a prediction layer.
2. The method for reconstructing an image of a self-calibrating lightweight event camera according to claim 1, wherein the event stream is acquired by the event camera and preprocessed into a voxel grid, comprising: event stream by bilinear interpolation Conversion to a voxel grid The following formula is satisfied: in the formula, Representing pixel coordinates, B representing the number of voxel grid channels, Representing event polarity; Representing the normalized time stamp; And Is the minimum and maximum of the time stamp.
3. The method for reconstructing the image of the self-calibration lightweight event camera according to claim 1, wherein the self-calibration transform module comprises a low-frequency branch and a high-frequency branch, the low-frequency branch extracts local structural features through a convolution layer, the high-frequency branch extracts global space-time dependence through a lightweight transform, extracts depth features rich in global context and high-frequency details from an input, and generates a space-adaptive weight map corresponding to the space size of the input feature map through a convolution layer and a Sigmoid activation function in parallel, multiplies the depth features by the space-adaptive weight map, and enhances the high-frequency details.
4. A self-calibrating lightweight event camera image reconstruction method as defined in claim 3, wherein local detail and spatial structure are extracted by a convolution layer, and a LeakyReLU activation function is used to enhance nonlinear expression to finally obtain low-frequency characteristics, wherein the process satisfies the following formula: in the formula, In order to enter the characteristic information of the low frequency branch, For the obtained low frequency information.
5. A self-calibrating lightweight event camera image reconstruction method as in claim 3, wherein long distance spatiotemporal dependencies are captured by a lightweight transducer comprising a multi-head attention MDA and a feed forward network FFN, the MDA projecting input features into multiple subspaces, capturing different and complementary information from different locations, the FFN being used to suppress information flow across channels, the process satisfying the following formula: in the formula, In order to enter the characteristic information of the high-frequency branch, Global context rich information extracted for the transducer, To obtain high frequency information.
6. The method for reconstructing a self-calibrating lightweight event camera image according to claim 3, wherein features of the low frequency branch and the high frequency branch are spliced along a channel dimension, and a final feature is obtained by adding the fusion feature and the residual feature through convolution and PixelShuffle layers : Wherein PS is PixelShuffle layers, As an initial feature of the device, In order for the splicing operation to be performed, In the case of the high-frequency information, Is low frequency information.
7. The method for reconstructing an image of a self-calibrating lightweight event camera according to claim 1, wherein the decoder uses bilinear upsampling and standard convolution layers and gradually reconstructs texture, edge and structure information by jumping to integrate multi-scale features of the encoder, resulting in feature maps with increased spatial resolution.
8. The self-calibration lightweight event camera image reconstruction method according to claim 1, wherein feature expression of jump connection is enhanced through a dual-domain spatial spectrum modulation module, shallow layer features obtained by a head layer are received, and the features are decomposed into frequency domain modulation branches and airspace feature branches; For frequency domain branches, a Fourier Unit is adopted to map the spatial characteristics to a frequency domain, the expression capacity is enhanced through a nonlinear activation function, and then, the modulated frequency domain characteristics are mapped back to the spatial domain by inverse Fourier transformation to obtain the frequency domain characteristics; For the space domain branches, extracting space features through a1×1 convolution, performing feature fusion through an element-by-element multiplication mode, and finally restoring the fused features to original dimensions through linear projection and outputting spectrum space enhancement features.
9. The method for reconstructing a self-calibrating lightweight event camera image according to claim 1, wherein the spectral space enhancement features output by the dual-domain spatial spectrum modulation module are fused with the feature map output by the decoder, and a final predicted intensity image is output by a channel through a standard convolution layer.
10. A self-calibrating lightweight event camera image reconstruction system for automatically performing the self-calibrating lightweight event camera image reconstruction method as set forth in any one of claims 1 to 9, the system comprising: the event camera is used for acquiring an event stream and preprocessing the event stream into a voxel grid; The light-weight neural network comprises a head layer, an encoder, a self-calibration transducer module, a two-domain spatial spectrum modulation module, a decoder and a prediction layer; The input module is used for inputting the voxel grid into the lightweight neural network, extracting shallow features by utilizing a head layer and extracting space-time features by utilizing an encoder, capturing local structure correlation and global space-time dependence from the space-time features by utilizing a self-calibration transducer module, and up-sampling the local structure correlation and the global space-time dependence through a decoder to obtain a feature map for increasing the spatial resolution; And the output module is used for fusing the spectrum space enhancement features and the feature map and outputting a reconstruction intensity map through the prediction layer.

Description

Self-calibration lightweight event camera image reconstruction method and system Technical Field The invention relates to the technical field of computer vision, in particular to a self-calibration lightweight event camera image reconstruction method and system. Background Event cameras are novel cameras inspired by biological vision systems, and have become promising sensors in the field of high-speed and high-dynamic-range imaging. Unlike traditional frame cameras, event cameras capture brightness changes in an asynchronous, sparse "event" stream, with ultra-low latency, low motion blur, high energy efficiency, and the like. These characteristics make it particularly advantageous in robotic, autopilot and high speed motion analysis applications. However, converting event data into an understandable intensity image is a core challenge due to the sparsity and asynchrony of the event stream. The existing method is mainly based on deep learning, such as E2VID and improved versions thereof (such as E2VID+, ET-Net and the like), and can realize higher reconstruction quality, but the model parameters are large in quantity and high in calculation complexity, and are difficult to deploy in practical application. On the other hand, although the lightweight model FireNet has high calculation efficiency, the reconstruction quality is limited, and the problems of detail loss, blurring or artifact and the like often occur, so that the efficiency and the performance cannot be simultaneously achieved. Disclosure of Invention The invention aims to solve the problems in the prior art, and provides a self-calibration lightweight event camera image reconstruction method and a self-calibration lightweight event camera image reconstruction system, which are suitable for practical application of an event camera, and realize high-quality and high-detail fidelity event stream video reconstruction while keeping low calculation cost. In order to achieve the above purpose, the present invention adopts the following technical scheme: a self-calibration lightweight event camera image reconstruction method comprises the following steps: Acquiring an event stream by using an event camera and preprocessing the event stream into a voxel grid; the method comprises the steps of constructing a lightweight neural network, wherein the lightweight neural network comprises a head layer, an encoder, a self-calibration transducer module, a double-domain spatial spectrum modulation module, a decoder and a prediction layer; Inputting the voxel grid into the lightweight neural network, extracting shallow features by using a head layer, and inputting the shallow features into an encoder to extract space-time features; capturing local structural correlation and global space-time dependency relationship from the space-time characteristics by utilizing a self-calibration transducer module; the local structure correlation and the global space-time dependency relationship are up-sampled by a decoder to obtain a feature map with increased spatial resolution; Receiving shallow layer characteristics obtained by a head layer through a double-domain spatial spectrum modulation module, fusing frequency domain and airspace information, and outputting spectrum space enhancement characteristics with the same dimension; and fusing the spectrum space enhancement features and the feature map, and outputting a reconstructed intensity map through a prediction layer. As a preferred solution, the event camera is used to acquire and preprocess the event stream into a voxel grid, specifically including: event stream by bilinear interpolation Conversion to a voxel gridThe following formula is satisfied: in the formula, Representing pixel coordinates, B representing the number of voxel grid channels,Representing event polarity; Representing the normalized time stamp; And Is the minimum and maximum of the time stamp. As a preferable scheme, the lightweight neural network comprises a head layer, an encoder, a self-calibration transducer, a two-domain spatial spectrum modulation module, a residual block, a decoder and a prediction layer, wherein the head layer is used for performing preliminary feature mapping on an input voxel grid; The device comprises an encoder for extracting space-time characteristics and downsampling, a self-calibration transducer module for capturing long-distance space-time dependence, a decoder for upsampling and recovering characteristics, a dual-domain spatial spectrum modulation module for strengthening jump connection characteristic representation, a residual block for improving the characteristic expression capability after encoding, and a prediction layer for mapping the characteristics into final gray intensity images. The space size of the input feature map is halved through the convolution layer and ConvLSTM, time sequence modeling is carried out on the features so as to capture time dependence, space and channel dimensions of the input and in