CN-121982085-A - Small baseline light field depth estimation method based on observability driving

CN121982085ACN 121982085 ACN121982085 ACN 121982085ACN-121982085-A

Abstract

The invention discloses a small baseline light field depth estimation method based on observability driving, which comprises the following steps of firstly constructing a large/middle/small three-gear equivalent baseline set on the same light field data according to an angular step, then calculating observability scores related to imaging geometry in each gear, selecting sub-visual angles according to the observability scores, constructing a weighted matching body, taking a large baseline as a teacher, taking a middle/small baseline as a student, adopting course type distillation training, realizing stable geometry priori migration from the large baseline to the small baseline, inputting the small baseline matching body into a trained student model in an inference stage, completing body aggregation, probability and expected regression to obtain parallax, and carrying out parallax-depth mapping according to the baselines and focal lengths. The method improves the precision, the boundary fidelity and the robustness under the condition of small base line without introducing external data, has controllable visual angle and calculation cost, and is suitable for three-dimensional sensing applications such as light field cameras, multi-shot arrays, AR/VR and the like.

Inventors

ZHOU ZHIHENG
YANG ZHIXIN
Tao Xiyuan
YAN QISEN
Sheng Tiantao
LI BINHONG

Assignees

华南理工大学

Dates

Publication Date: 20260505
Application Date: 20251215

Claims (10)

1. An observability-driven small baseline light field depth estimation method is characterized by comprising the following steps: S1, constructing a large, medium and small three-gear baseline set based on a sub-image array of the same light field data; s2, respectively calculating observability scores of all sub-views in each file aiming at a three-file baseline set in the light field data; s3, for each baseline set, adaptively selecting a visual angle according to the observability score, and constructing a matching body; s5, inputting the matched body obtained in the step S3 into a parallax estimation model after training is completed, obtaining parallax distribution, and outputting a final depth result through parallax-depth physical mapping.
2. The observability-driven small baseline light field depth estimation method according to claim 1, wherein step S1 comprises the steps of: S1-1, setting the size and angular index of a sub-image array in a light field data set, and unifying reference viewing angles; S1-2, constructing a three-gear baseline set by utilizing different sampling step sizes to obtain different baseline light field subsets, wherein the three-gear baseline set is respectively a large baseline, a medium baseline and a small baseline; s1-3, generating a list comprising three groups of angular indexes of a large baseline, a medium baseline and a small baseline for each scene in the light field data set, and labeling the samples with baseline labels; S1-4, geometric registration and scale consistency constraint, namely uniformly adopting the same reference view angle and the same candidate parallax set for a third-gear equivalent baseline set, and using a consistent imaging geometric and pixel coordinate system when constructing and aligning a matching body.
3. The observability-driven small baseline light field depth estimation method according to claim 1, wherein the observability score is obtained by weighting calculation of an amount of angular gradient information, straightness and signal to noise ratio.
4. The observability-driven small baseline light field depth estimation method according to claim 1, wherein step S3 comprises the steps of: firstly, calculating global scores of views based on observability scores for the views in each baseline set, and then selecting top K views with highest scores in each baseline set to form view sets corresponding to each baseline set; then, extracting a feature map from each image in each view set through a learnable CNN feature extraction network; and finally, aligning the feature images of different views in the same baseline set according to the imaging geometric relationship, and calculating the similarity so as to obtain a matching body corresponding to each baseline set.
5. The observability-driven small baseline light field depth estimation method according to claim 1, wherein the training of the parallax estimation model comprises the steps of: (1) Training a large baseline teacher model, namely training the teacher model on the matched body of the large baseline set obtained in the step S3, and fixing teacher parameters after training is finished; (2) Generating and caching teacher model signals, namely forward reasoning a training and verifying sample under the condition of a large baseline set, generating and caching disparity map and heteroscedastic uncertainty as subsequent distillation signals, and ensuring that a reference view angle aligned with student model input is consistent with a coordinate system; (3) Sequentially distilling the middle baseline set and the small baseline set to train a student model; (4) Training the whole training process according to the sequence of a large baseline set, a middle baseline set and a small baseline set, wherein a teacher model uses data of the large baseline set to train and uses a real parallax value to supervise, a student model uses the same framework as the teacher model, uses data of the middle baseline set and the small baseline set to train and uses output of the teacher model and the real parallax value to jointly supervise, wherein after the middle baseline set stage convergence is completed, the same student model is continued to train on the small baseline set, the source of teacher signals is kept consistent with a reference viewing angle, and strict comparability of input and output space crossing the baselines is ensured; (5) And updating and gradient constraint, namely keeping the teacher parameters not updated in the student stage, updating the student parameters by minimizing the total loss function, keeping the configuration of the optimizer, the learning rate and the batch size consistent in the whole student stage, ensuring the course switching stability, and obtaining the optimal student model as a small baseline parallax estimation model.
6. The observability-driven small baseline light field depth estimation method according to claim 1, wherein step S5 comprises the steps of: s5-1, sending a matching body of the target scene into a trained small baseline parallax estimation model, and carrying out body aggregation and scoring: Wherein the method comprises the steps of Representing small baseline matches The in-vivo regularization and discrimination enhancement are performed, To be represented in pixels Parallax at Is a fraction of (2); S5-2, taking the parallax score obtained in the step S5-1 as input, carrying out softmax normalization on each pixel on the candidate parallax set to obtain posterior distribution, and finally obtaining continuous parallax estimation by adopting expected regression to obtain final parallax S5-3, performing parallax-depth physical mapping according to the final parallax, the base line and the focal length.
7. The method for estimating depth of small baseline light field based on observability driving according to claim 1, wherein in step S5-1, the matching body of the target scene is sent to the trained student model to improve the depth estimation performance of the small baseline.
8. A system for implementing an observability-driven small baseline light field depth estimation method according to claim 1, comprising: the baseline set module is used for constructing a large, medium and small three-gear baseline set; the observability scoring module is used for calculating the observability score of each sub-view in each sub-set in the third-gear baseline set; the matching body module is used for constructing a weighted matching body; the light field depth estimation module is used for inputting the matching body into the parallax estimation model to output a final depth result; and the estimation model training module is used for training the parallax estimation model.
9. A computer device comprising a memory and a processor, the memory being electrically connected to the processor, the memory storing a computer program, wherein the computer program, when executed by the processor, causes the processor to implement the method as claimed in any one of claims 1 to 8.
10. A computer readable storage medium storing a computer program, wherein the computer program is executed by a processor, the processor implementing the method according to any one of claims 1 to 8.

Description

Small baseline light field depth estimation method based on observability driving Technical Field The invention belongs to the technical field of computational imaging and computer vision, and particularly relates to a small baseline light field depth estimation method based on observability driving. Background Light field (LIGHTFIELD) imaging with two-plane parametrizationRecording the angle and the space information of a scene: representing the coordinates of the imaging plane, Representing aperture/viewpoint sampling locations. Compared with monocular/binocular imaging, the light field acquires a dense sub-aperture image (sub-apertureviews) in one exposure, and the light field naturally contains geometrical clues such as parallax, shielding, out-of-focus blurring and the like, so that an information basis is provided for high-precision depth estimation. From the geometrical relationship, parallax under a thin lens, small base line assumptionDepth to sceneApproximately satisfy, see FIG. 2 Wherein the method comprises the steps ofIs equivalent to a base line,Is a focal length,Is the pixel spacing. The EPI (EpipolarPlaneImage) of the light field slices the space and the angular dimension, the ideal lambertian surface point is approximately a straight line in the EPI, and the slope is proportional to the parallax, so that the slope/gradient strength and linearity in the angular dimension can be used as physical basis for the observability of the parallax. A typical light field depth estimation procedure follows the paradigm of "feature extraction-candidate alignment-cost-body construction-body regularization/regression": feature extraction by extracting convolution or attention features from each sub-aperture map ; Candidate alignment for discrete candidate disparitiesPerforming translation/resampling on adjacent view angle characteristics according to a geometric relation; Cost volume construction by stacking the similarity of reference view and neighboring view into (Correlation, cosine, splice +3D convolution, etc.); Volumetric regularization/regression employing 3D convolution, recursive units, or attention-in Up-aggregation, outputting depth or disparity and possibly accompanied by uncertainty. In engineering practice, factors such as angular sampling density, actual baseline size, noise/illumination change, non-lambertian reflection and shielding of light field data can jointly influence observability and matching reliability of parallax, and meanwhile, cost volume and selected view angle quantity determine calculation/storage overhead and deployment feasibility. Thus, key technical concerns in light field depth estimation are developed around the quantification and exploitation of parallax observability, adaptive selection of view subsets, and training strategies under different "equivalent baseline" conditions. In a light field depth estimation method and device (CN 120431147A) based on ambiguity and plane priori knowledge, an effective neighborhood of a reference view is firstly determined in a micro-lens light field according to epipolar geometry and overlapping degree, views in the neighborhood are grouped according to definition (ambiguity) and different weights are assigned to matching costs, initial parallax estimation is completed based on clear view contribution, then pixel-by-pixel space propagation iteration is adopted, stable updating parallax is removed by matching with mismatching, plane priori is introduced into a weak texture area, an original result is replaced if the optimized cost is better by constructing a bin and interpolating the parallax in the bin and normal direction, and finally parallax results of all the reference views are summarized to obtain a depth map of a target light field. The overall emphasis is based on the geometrical consistency enhancement of sharpness weighted multi-view matching and planar priors to improve the quality of estimation under blurred and weak texture conditions. Although the prior art improves the matching of weak textures and blurring conditions through sharpness weighting and plane prior, there are a plurality of limitations that firstly, the selection and weight of the view angle mainly depends on blurring degree or preset neighborhood, and observability quantization (such as EPI slope stability, angular gradient information amount and noise level joint measurement) directly related to imaging geometry is lacking, so that 'harmful view angle' is difficult to purposefully eliminate and continuously weight at pixel/block level, and pseudo matching and pulling are still easy to generate at repeated textures and shielding boundaries. Secondly, spatial propagation and bin interpolation rely on manual threshold values and local plane assumptions, excessive smoothing and detail loss are easy to occur in non-planar structures or complex occlusion scenes, and errors in the propagation process are difficult to correct by subsequent steps once entering.