CN-121999125-A - Self-calibration surface normal field reconstruction method and device based on asynchronous event stream

CN121999125ACN 121999125 ACN121999125 ACN 121999125ACN-121999125-A

Abstract

The invention discloses a self-calibration surface normal field reconstruction method and device based on asynchronous event streams. The method comprises the steps of controlling an illumination unit to actively scan to generate a time-varying illumination field, acquiring an event stream through a monocular asynchronous vision sensor, constructing a joint state vector containing a three-dimensional surface normal vector by taking the ratio of ambient light to surface albedo as a fourth component, deducing orthogonal constraint of the joint state vector and a four-dimensional differential observation vector to realize mathematical decoupling of ambient light interference, characterizing a time-varying illumination track as weighted superposition of discrete anchor points by adopting a parameterized basis function with local support characteristics, establishing a corresponding relation between an asynchronous event and global illumination, resolving generalized bas-relief transformation ambiguity by utilizing rigid constraint provided by space geometrical configuration preset by the illumination unit, and simultaneously resolving the surface normal field and illumination parameters through global joint iteration optimization. The invention can realize the surface reconstruction with low data bandwidth and high dynamic range under the condition of non-darkroom without precise illumination calibration.

Inventors

Shi Baixin
LIANG JINXIU
YU BOHAN
YANG SIQI
Zhuang Haotian
Ren Jieji
DUAN PEIQI

Assignees

北京大学

Dates

Publication Date: 20260508
Application Date: 20251209

Claims (9)

1. The self-calibration surface normal field reconstruction method based on the asynchronous event stream is characterized by comprising the following steps of: s1, constructing a dynamic illumination response environment, namely controlling an illumination unit to generate an illumination field with illumination direction continuously changed along with time, and actively scanning and illuminating a target to be tested, wherein the illumination unit comprises a plurality of light-emitting units, the relative positions of the light-emitting units in a body coordinate system of the illumination unit are known and fixed to form a preset space geometric configuration, and the global pose parameters of the illumination unit relative to the target to be tested are unknown; S2, acquiring asynchronous event stream data, namely monitoring brightness change of the surface of the target to be detected due to illumination change through a monocular asynchronous vision sensor and outputting the asynchronous event stream data, wherein the asynchronous vision sensor triggers an event when the pixel level brightness change exceeds a contrast threshold value; s3, establishing joint state orthogonal constraint, namely deriving brightness multiplicative relation between adjacent event triggering moments based on a logarithmic response triggering mechanism and a lambertian reflection model of the asynchronous vision sensor, and combining ambient light intensity Albedo with the surface Is defined as the ambient light scale factor Taking the ambient light scale factor as a fourth component and a three-dimensional surface normal vector Joint construction of four-dimensional joint state vector Based on the brightness multiplicative relationship, constructing four-dimensional differential observation vectors for adjacent event pairs in the event time sequence Establishing a joint state orthogonal constraint between the four-dimensional differential observation vector and the four-dimensional joint state vector Wherein the joint state orthogonal constraint is strictly established under the condition that the ambient light is constant, so that the solution of the surface normal vector is mathematically decoupled from the ambient light intensity; S4, continuously parameterizing time-varying illumination, namely modeling unknown time-varying illumination vectors by adopting a parameterized basis function with local support characteristics, and characterizing illumination tracks as Weighted superposition of discrete anchor vectors, where Calculating corresponding basis function weight coefficients according to the time stamps of the events, converting the joint state orthogonal constraint into bilinear expressions about the anchor point vectors, and establishing time corresponding relations between asynchronous events and the anchor point vectors at different moments; S5, resolving ambiguity by geometric constraint, namely constructing an ambiguity resolving model by using known relative position relations of a plurality of luminous units in a body coordinate system in the illumination units as rigid constraint conditions, and resolving generalized bas-relief transformation matrix, rotation matrix and translation vector to minimize geometric errors between the transformed anchor point vector and the relative positions of the luminous units so as to resolve inherent generalized bas-relief transformation ambiguity of photometric stereo vision; Step S6, global joint iterative optimization, namely constructing an objective function comprising a data fitting term and an integrality regularization term, adopting a gradient optimization strategy, iteratively updating the four-dimensional joint state vector of each pixel and the global anchor point vector until a convergence condition is met, periodically executing geometric constraint projection operation in the iterative process to maintain geometric consistency of a solution until a preset convergence condition is met, extracting and normalizing the first three components from the four-dimensional joint state vector after convergence, and outputting a surface normal field of the target to be tested.
2. The self-calibrating surface normal field reconstruction method based on asynchronous event streams according to claim 1, wherein in the step S3, the derivation process of the luminance multiplicative relationship is as follows: setting the pixel at the moment Is of (1) Wherein In order to be a surface albedo, Is the normal vector of the unit surface, As the light vector of the light source, Is the intensity of ambient light, when The individual events are at the moment When triggered, satisfies: Wherein, the As a threshold value of the contrast ratio, Taking the index of the above formula to obtain the brightness multiplication relation: Wherein the method comprises the steps of The four-dimensional joint state vector and the four-dimensional differential observation vector are defined as follows: The four-dimensional joint state vector is defined as Wherein Scaling the factor for the ambient light; The four-dimensional differential observation vector is defined as Wherein: the joint state orthogonal constraint is expressed as The unfolding form is as follows: 。
3. The asynchronous event stream based self-calibration surface normal field reconstruction method according to claim 1, wherein in said step S4, at any time instant Is of the illumination vector of (a) Characterized by: Wherein, the Is the first The number of anchor vectors is chosen such that, Is the first The individual basis functions are at time The parameterized basis function is selected from B-spline basis functions, fourier basis functions or radial basis functions, selected from And when the B spline basis function is performed, the weight value of the basis function is calculated through the following recursive relation: Wherein the method comprises the steps of Is a sequence of nodes.
4. The method for reconstructing a self-calibration surface normal field based on an asynchronous event stream according to claim 1, wherein in said step S5, the spatial geometrical configuration of the lighting units satisfies a non-degenerate condition that at least seven points exist in the set of position points of the plurality of lighting units, and the seven points are not coplanar nor all located on any quadric surface passing through an origin of coordinates; the spatial geometry takes one of the following forms: The multi-ring coaxial configuration comprises at least two coaxial annular luminous unit arrays, wherein the at least two annular luminous unit arrays have different radiuses and are axially spaced by a preset distance; and the space curve configuration is that the arrangement tracks of the plurality of light emitting units form a non-planar curve in a three-dimensional space.
5. The asynchronous event stream based self-calibration surface normal field reconstruction method according to claim 1, wherein in the step S5, the position of the light emitting unit in the body coordinate system is set as The anchor point vector currently estimated is The geometric error is defined as: Wherein, the In the form of a generalized bas-relief transformation matrix, In order to rotate the matrix is rotated, Is a translation vector; The generalized bas-relief transformation matrix has the form: Wherein, the , 、 Is a real parameter.
6. The asynchronous event stream based self-calibration surface normal field reconstruction method according to claim 1, wherein in the step S6, the objective function is defined as: Wherein, the As the weight coefficient of the light-emitting diode, The term is fitted to the data and, Regularizing the term for integrality, wherein the data fitting term The definition is as follows: Wherein, the In order for the set of pixels to be an active set, Is a pixel The number of events at which to occur, Is a pixel Is provided with a four-dimensional joint state vector, Is a pixel Location No Four-dimensional differential observation vectors corresponding to the events; the integrality regularization term The definition is as follows: Wherein the method comprises the steps of For three components of the surface normal vector, the partial derivatives are calculated by finite difference approximation of neighboring pixels.
7. The self-calibrating surface normal field reconstruction method based on asynchronous event flow according to claim 6, wherein in said step S6, an optimization strategy based on gradient descent is adopted, comprising in particular: Parameter initialization, setting a four-dimensional joint state vector of each pixel and a global anchor point vector sequence as optimizable parameters and giving an initial value, wherein for the four-dimensional joint state vector, a three-dimensional surface normal vector component is initialized to a unit vector pointing to the optical axis direction of a camera, and an ambient light scale factor is initialized to zero; gradient calculation, namely calculating the gradient of the objective function relative to the optimizable parameter based on an automatic differentiation mechanism; Updating parameters, namely updating the optimizeable parameters according to the gradient by adopting an adaptive moment estimation optimizer; The geometric constraint projection, after each parameter update or every preset iteration number, carries out projection correction on the updated anchor point vector sequence to the space geometric configuration, and comprises the steps of calculating the optimal transformation parameters between the anchor point vector and the position of the luminous unit 、、 Updating anchor point vector to Synchronous updating normal vector as ; And circularly executing the gradient calculation, the parameter updating and the geometric constraint projection until at least one convergence condition is met, wherein the relative variation of the objective function value of two adjacent iterations is smaller than a first preset threshold value, or the average value of the angle variation of the surface normal field of two adjacent iterations is smaller than a second preset threshold value, or the iteration times reach a preset maximum value.
8. The self-calibrating surface normal field reconstruction method based on asynchronous event streams according to any of claims 1 to 7, further comprising an event filtering step after said step S2: space-time correlation filtering, in which, for any event in the event time sequence, statistics is carried out on the event pixel coordinates as the center and the side length as the side length Within a spatial window of pixels and centered at the event time stamp and of length If the number of neighborhood events is less than a threshold Removing the event; high frequency event suppression by calculating time intervals of adjacent events for event time series of each pixel if the time intervals are less than a threshold And marking the event as a high-frequency abnormal event, and reducing the weight of the joint state orthogonal constraint corresponding to the high-frequency abnormal event in the data fitting item or excluding the joint state orthogonal constraint in the step S6.
9. A self-calibrating surface normal field reconstruction apparatus based on asynchronous event streams, comprising: The illumination unit comprises a controller and a plurality of light-emitting units, wherein the plurality of light-emitting units are fixedly installed according to a preset space geometric configuration, and the space geometric configuration meets a non-degradation condition; The asynchronous vision sensor adopts an event triggering mechanism based on logarithmic brightness change detection and is used for responding to brightness change of the surface of a target to be detected and outputting asynchronous event stream data comprising pixel coordinates, a time stamp and a polarity identifier; the data processing unit is in communication connection with the asynchronous vision sensor and comprises a processor and a memory, wherein the memory stores a computer program, the processor realizes the self-calibration surface normal field reconstruction method based on the asynchronous event stream when executing the computer program, and the surface normal field of the target to be detected is solved from the asynchronous event stream data.

Description

Self-calibration surface normal field reconstruction method and device based on asynchronous event stream Technical Field The invention relates to the technical field of computer vision and three-dimensional reconstruction, in particular to a self-calibration surface normal field reconstruction method and device based on an asynchronous event stream, which can be used for industrial precision detection, cultural relic digital protection, robot vision perception, high-speed dynamic scene capture and other application scenes. Background The three-dimensional surface reconstruction technology has wide application value in the fields of industrial precision detection, cultural relic digital protection, robot vision perception and the like. The photometric stereo vision technology is used as an important technical means, and the surface normal vector is deduced by analyzing the surface reflection brightness changes of the object in different illumination directions, so that high-frequency geometric details of pixel level, including fine structure information such as micro scratches, fabric textures and the like, can be recovered. Traditional photometric stereo vision methods rely mainly on frame cameras for image acquisition, organizing the pixel brightness values of all images into an observation matrix and performing matrix decomposition. However, in an actual industrial field or open scene, such frame camera based approaches face several technical bottlenecks: First, the data collection efficiency is low. To obtain a robust normal estimate typically requires acquisition of tens of images at different illumination directions. The limited dynamic range of the frame camera makes the scene with high dynamic range have to be subjected to multiple exposure surrounding, the whole acquisition process consumes longer time, the measured object is required to be kept completely static during the period, the data transmission bandwidth is high in pressure, and the real-time requirement of online detection is difficult to meet. Second, dynamic range is limited. The dynamic range of a conventional industrial camera is generally about 60dB, and when an object with high reflection or darkness is photographed, high light saturation or darkness information loss easily occurs, so that normal information cannot be accurately acquired. Third, ambient light interferes. Existing non-calibration algorithms mostly assume an ideal darkroom environment or rely on the acquisition of dark frames without illumination to subtract background light. However, in uncontrolled lighting environments, such simple subtraction strategies are difficult to implement effectively. The event camera is used as a neuromorphic vision sensor, and provides a new technical idea for solving the problems due to the high dynamic range, microsecond time resolution and sparse event output characteristics. The event camera does not output an image at a fixed frame rate, but asynchronously detects a pixel-level logarithmic brightness change and outputs a stream of events, each pixel operating independently, generating events only when a local logarithmic brightness change is detected to exceed a threshold. The existing three-dimensional reconstruction method based on the event camera has the following defects: (1) Depending on stereo matching and complex hardware, the existing stereo matching-based technology (such as patent document CN 112365585B) adopts a binocular or multi-view event camera system, and utilizes the stereoscopic vision principle to carry out depth reconstruction. Such methods not only increase the hardware cost and bulk of the system, but also rely on feature matching from left to right viewing angles, it is difficult to obtain accurate matching results in non-textured or re-textured areas, and it is difficult to recover minute surface details (e.g., scratches, fabric texture). (2) The special problem of luminosity three-dimensional is difficult to treat, the existing visual odometer-based method (such as patent document CN 117132619A) mainly focuses on tracking of sparse feature points and camera pose estimation, and high-density surface normal fields are difficult to recover. More importantly, the general state estimation methods do not consider the special ambient light interference and generalized bas-relief transformation ambiguity problems in photometric stereo vision. The traditional frame method of the non-calibrated luminosity stereoscopic vision method eliminates generalized bas-relief transformation ambiguity by establishing a pixel corresponding relation and relying on statistical priori of albedo distribution, but the premise of establishing the pixel corresponding relation is synchronous acquisition under the same illumination condition, the asynchronous nature of an event makes the direct correspondence unable to be established, and a logarithmic difference mechanism of an event camera naturally eliminates albedo items, so that the