CN-122027778-A - High-speed stereoscopic imaging method and device based on monocular pulse time domain modulation
Abstract
The invention discloses a high-speed stereoscopic imaging method and device based on monocular pulse time domain modulation, by introducing a periodic time-domain modulation signal in the optical path, a unique "time-stamp" is applied to the image at a particular viewing angle in the mixed light intensity signal. The capturing capability of the pulse camera on the high-frequency modulation signal is utilized, and the mixed images which are difficult to separate in space originally can be effectively distinguished in a time domain, so that the high-frame-rate and high-precision stereoscopic video reconstruction is realized while the advantages of compact monocular system hardware and no geometric distortion are maintained, and the technical blank in the field of high-speed monocular stereoscopic vision is effectively filled.
Inventors
- Shi Baixin
- Yelidos Xiaokaiti
- CHANG YAKUN
- BAI YANG
- HUANG ZHAOJUN
- DUAN PEIQI
Assignees
- 北京大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260129
Claims (10)
- 1. The high-speed stereoscopic imaging method based on monocular pulse time domain modulation is characterized by comprising the following steps of: S1, acquiring a scene mixed light signal subjected to time domain modulation through a pulse camera to generate a mixed pulse stream, wherein the mixed pulse stream is generated in a preset short time window In, based on the assumption of constant light intensity, constructing a linear equation set according to a plurality of pulse events in the mixed pulse stream; S2, solving the linear equation set by least square optimization to obtain a time window Intensity of inner left view And right view light intensity Further generating a baseline video sequence of the left and right views; S3, inputting the baseline video sequence and the mixed intensity graph accumulated based on the complete modulation period into a pre-trained SMS-Net neural network, performing brightness correction, artifact removal and texture complement, and outputting a refined stereoscopic video frame.
- 2. The high-speed stereoscopic imaging method based on monocular pulse time domain modulation according to claim 1, wherein in step S1, constructing a system of linear equations is specifically: , Wherein Q is a charge threshold triggered by a single pulse, As a composite residual term, Is the coefficient of the photoelectric conversion and is used for the light-to-electricity conversion, The left and right view intensities to be solved for respectively, For the i-th pulse interval, For the instant of triggering of the ith pulse, For the i-1 th pulse trigger instant, For the integration value of the modulation function in the i-th pulse interval, As a time domain modulation function, n is a time window Number of pulse events within.
- 3. The monocular pulse time domain modulation based high speed stereoscopic imaging method of claim 1, wherein in step S3, the SMS-Net neural network comprises: the encoder is used for extracting the input baseline left-right view sequence and the multiscale characteristics of the full-period mixed intensity graph; the self-adaptive brightness correction module is used for respectively carrying out brightness consistency correction on the characteristics of the left view and the right view through affine transformation by taking the characteristics of the full-period mixed intensity graph as brightness references; the cross visual angle artifact removal module adopts a three-dimensional cross attention mechanism along the polar line direction to enable corrected left and right view features to interact so as to identify and remove residual artifacts generated by decoupling; The cyclic space-time fusion module adopts a cyclic neural network structure, aligns the characteristic state of the history moment by utilizing deformable convolution, and fuses the history information and the current frame characteristic by an attention mechanism so as to complement texture details; And the shared decoder is used for reconstructing and outputting the refined left and right view video frames according to the fused multi-scale characteristics.
- 4. A monocular pulse time domain modulation based high speed stereoscopic imaging method according to claim 3, wherein the adaptive brightness correction module performs the following operations: Computing left view feature maps And mixed intensity map features Global channel statistics of (a); predicting a scaling factor by a convolutional layer based on global channel statistics Bias and method of making same ; By affine transformation Obtaining the left view characteristic after brightness correction, and the right view characteristic Performing the same operation to obtain brightness corrected right view features 。
- 5. A monocular pulse time domain modulation based high speed stereoscopic imaging method of claim 3, wherein the cyclic spatio-temporal fusion module comprises: a deformable alignment unit for learning the spatial offset delta and the modulation mask m and hiding the state features from the history Performing deformable convolution operation to align to the current moment; the gating time sequence attention unit is used for calculating attention weight between the current frame characteristic and the aligned historical characteristic, and carrying out weighted summation on the value vectors to realize characteristic fusion; and the channel attention enhancement unit is used for carrying out global average pooling on the fused features, generating channel attention weights through one-dimensional convolution and an activation function and carrying out channel enhancement.
- 6. A high-speed stereoscopic imaging device based on monocular pulse time domain modulation, comprising the following modules to implement the method of any one of claims 1-5: the optical imaging module is used for collecting and modulating scene light signals and outputting mixed light signals containing left and right view information; the pulse imaging and collecting module is used for integrating and sampling the mixed optical signals by adopting a pulse camera to generate a mixed pulse stream; the baseline decoupling module is used for decoupling left and right view baseline video sequences with high frame rate from the mixed pulse stream in real time based on a least square optimization algorithm; and the refinement and reconstruction module is used for performing refinement treatment on the baseline video sequence by adopting a pre-trained neural network and outputting high-quality stereoscopic video frames.
- 7. The monocular pulse time domain modulation based high speed stereoscopic imaging apparatus of claim 6, wherein the optical imaging module comprises: the beam splitter is used for reflecting a first light path from a scene and transmitting a second light path from the scene so as to couple two paths of light rays to the same light path; The plane reflector is arranged on the second light path and used for changing the direction of the second light path; A time domain light intensity modulator disposed on the second optical path after being reflected by the plane mirror for modulating the function according to a known periodicity Performing time domain modulation on the light intensity of the second light path; The modulated second light path light is transmitted by the beam splitter and then mixed with the reflected first light path light to form a mixed light signal, and the mixed light signal enters the pulse imaging and acquisition module.
- 8. The monocular pulse time domain modulation based high speed stereoscopic imaging apparatus of claim 6, wherein the pulse camera pixels are in time intervals [ in the pulse imaging and acquisition module The charge accumulation model within: , Wherein, the Is a charge threshold triggered by a single pulse, Is the charge quantization residual due to the discrete sampling characteristics of the pulse camera, Is the coefficient of the photoelectric conversion and is used for the light-to-electricity conversion, For the left-view light intensity, For the right-hand view light intensity, The instant of triggering of the ith pulse, For the i-1 th pulse trigger instant, Is a time domain modulation function.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-5 when the program is executed by the processor.
- 10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.
Description
High-speed stereoscopic imaging method and device based on monocular pulse time domain modulation Technical Field The invention relates to the technical field of data processing, in particular to a high-speed stereoscopic imaging method and device based on monocular pulse time domain modulation. Background Stereoscopic vision is used as a basic technology of machine perception and interaction physical world, and plays a core role in the fields of automatic driving, robot navigation, industrial precise detection, augmented reality and the like. Particularly in complex scenes involving high-speed movements, the vision system needs to have not only high-precision spatial resolution but also extremely high temporal resolution to capture dynamic details of the transient changes. However, the currently mainstream visual perception scheme always faces a technical contradiction that is difficult to reconcile when balancing multiple constraints of system volume, hardware cost, data transmission efficiency, and imaging speed. In the prior art systems, binocular and multiview vision systems, although the schemes are mature, have exposed significant limitations in practical applications. First, the binocular architecture inevitably increases the complexity of the hardware system, and the introduction of multiple imaging sensors and associated optics makes the system a multiple increase in volume, weight and power consumption, which makes it difficult to accommodate space-sensitive miniature devices. Secondly, the stability requirements of the multi-camera system on the mechanical structure are extremely high, and the relative position offset caused by small vibration can force the system to recalibrate. More critical is that in high-speed vision applications where high frame rates are sought, binocular systems mean double the data throughput, which places a heavy burden on the data transmission bandwidth and back-end computing units, resulting in extremely inefficient data for the system. To overcome the hardware redundancy of the multi-view system, monocular stereoscopic techniques have evolved. Early mirror schemes projected different viewing angles to different areas of the same sensor by "spatial optical modulation", but this sacrifices effective resolution, and epipolar corrections rely on scene plane assumptions, which are highly prone to severe geometric distortion in non-planar scenes, greatly reducing depth estimation accuracy. Another hybrid projection scheme, while avoiding resolution loss and distortion from field-of-view segmentation, aliasing two images on the same sensor creates a very difficult "blind source separation" problem. Due to the lack of effective separation cues, such methods often rely on extremely strong global disparity priors, which makes it difficult to handle complex dynamic scenes. In addition, in addition to the decoupling problem in the spatial dimension, the performance shortboards of conventional vision sensors in the temporal dimension also limit technological breakthroughs. The conventional frame camera is limited in frame rate and is prone to motion blur when photographing a high-speed moving object, which makes the use of high-frequency time domain signal to assist decoupling a luxury. Disclosure of Invention Aiming at the decoupling problem in monocular stereoscopic vision in the prior art, the invention provides a high-speed stereoscopic imaging method based on monocular pulse time domain modulation, which is characterized in that a unique time label is marked on a mixed light intensity signal for an image with a specific visual angle by introducing a periodic time domain modulation signal into a light path. The capturing capability of the pulse camera on the high-frequency modulation signal is utilized, and the mixed images which are difficult to separate in space originally can be effectively distinguished in a time domain, so that the high-frame-rate and high-precision stereoscopic video reconstruction is realized while the advantages of compact monocular system hardware and no geometric distortion are maintained, and the technical blank in the field of high-speed monocular stereoscopic vision is effectively filled. In order to achieve the above object, the present invention provides the following technical solutions: in a first aspect, the present invention provides a high-speed stereoscopic imaging method based on monocular pulse time domain modulation, comprising the steps of: S1, acquiring a scene mixed light signal subjected to time domain modulation through a pulse camera to generate a mixed pulse stream, wherein the mixed pulse stream is generated in a preset short time window In, based on the assumption of constant light intensity, constructing a linear equation set according to a plurality of pulse events in the mixed pulse stream; S2, solving the linear equation set by least square optimization to obtain a time window Intensity of inner left viewAnd right view li