CN-121982084-A - Variable-focus binocular vision distance method and system integrating instance segmentation and self-calibration

CN121982084ACN 121982084 ACN121982084 ACN 121982084ACN-121982084-A

Abstract

The invention belongs to the technical field of computer vision and three-dimensional perception, and particularly relates to a variable-focus binocular vision distance method and system integrating example segmentation and self-calibration, which comprises the steps of performing self-calibration by using images acquired by a binocular camera based on a self-calibration method of deep learning to obtain internal and external parameters of the camera; the method comprises the steps of obtaining mask areas, category labels and confidence degrees of each detection example in an image, preprocessing left and right images acquired by a binocular camera, inputting the preprocessed images into a stereo matching network to obtain a dense parallax image, adjusting the dense parallax image to be the original image size, calculating a dense depth image by using a focal length obtained through self calibration, converting the dense depth image into a pseudo-color depth image, extracting depth values corresponding to pixels of the mask areas from the dense depth image, taking the depth median of the areas as a distance estimation value of the examples to be overlapped in the original image acquired by the camera to obtain an example depth overlapped image, and converting the pseudo-color depth image and the example depth overlapped image into image information and publishing the image information respectively.

Inventors

ZHANG LELE
FU HAILING
ZHANG YUFENG
DENG FANG
CHEN HAO
MA JIANLIANG
HUO LIZHI
WANG YANPENG
LIU WENBIN
SHI XIANG

Assignees

北京理工大学

Dates

Publication Date: 20260505
Application Date: 20251209

Claims (10)

1. A variable-focus binocular vision method integrating example segmentation and self-calibration is characterized by comprising the following specific steps: The camera self-calibration, namely, self-calibrating is carried out by utilizing images acquired by a binocular camera based on a self-calibration method of deep learning to obtain internal and external parameters of the camera; example segmentation, namely acquiring a mask area, a category label and confidence coefficient of each detection example in an image; The depth estimation comprises the steps of preprocessing left and right images acquired by a binocular camera, inputting the preprocessed left and right images into a stereo matching network to obtain a dense parallax map, and adjusting the dense parallax map to the original map size; extracting depth values corresponding to the pixels of the mask region from the dense depth map, and overlapping the depth median value of the region serving as a distance estimation value of the instance in an original image acquired by a camera to obtain an instance depth overlapping map; And (3) image visualization, namely respectively converting the pseudo-color depth map and the instance depth overlay map into image messages and publishing the image messages.
2. The method for fusing instance segmentation and self-calibration of variable focus binocular range as claimed in claim 1, wherein the self-calibration method based on deep learning is self-calibrated to obtain camera internal and external parameters, comprising the following steps: Step 2.1, extracting multi-scale features through a neural network based on an encoder-decoder architecture, and predicting the field of view pixel by pixel , And Generating corresponding confidence weights for each pixel respectively for the upper vector and the latitude angle of each pixel And ; Step 2.2, constructing a nonlinear weighted least squares objective function So that the theoretical perspective field generated by the current camera parameters Network prediction result Gradually updating camera parameters to minimize residual errors; and 2.3, outputting the optimized camera internal and external parameters when the iteration termination condition is met.
3. The fusion instance segmentation and self-calibration variable-focus binocular vision method of claim 2, wherein the objective function is: Wherein, the Representing the in-camera and out-camera parameters to be optimized.
4. A variable focus binocular vision method of merging instance segmentation and self-calibration according to claim 3, characterized in that it uses The algorithm updates the camera parameters step by step to minimize the residual, The algorithm update formula is: Wherein, the In the form of a jacobian matrix, Is a hessian matrix, lambda is a damping factor, As a diagonal matrix of confidence weights, For residual vector, camera parameters are continuously and iteratively updated 。
5. The method for fusing instance segmentation and self-calibration of variable focus binocular vision according to claim 1, wherein the instance segmentation comprises the following specific processes: Step 3.1, after the input image is subjected to a main convolution neural network to extract a multi-level characteristic diagram, respectively outputting the multi-level characteristic diagram to a target detection branch and a pixel level division branch; step 3.2, generating a candidate detection frame set by the target detection branch, wherein each detection frame comprises an instance category label and a confidence coefficient; And 3.3, information fusion, wherein the fused example segmentation result consists of a mask, a category label and a confidence coefficient.
6. The method for merging instance segmentation and self-calibration variable-focus binocular disparity according to claim 1, wherein the acquiring process of the dense disparity map is as follows: step 4.1, left image after pretreatment And right side view Simultaneously inputting the multi-scale feature images into a two-dimensional convolution feature extraction network sharing weights to extract multi-scale feature images And (3) with ; Step 4.2, in the set parallax search range Inside, right graph feature graph Per parallax Horizontally translate and match with the left graph Performing inner product calculation on the feature vectors at the corresponding positions so as to construct a 3D (three-dimensional) related cost body ; Step 4.3, the cost body is processed Inputting a three-dimensional convolution network to perform cost aggregation, fusing information of neighborhood pixels and different parallax planes in space and parallax dimensions, and obtaining an initial parallax map ; Step 4.4, the initial disparity map Convolution gating circulation unit with left graph context feature input The refinement module carries out iterative updating to obtain a final high-precision parallax image I.e. a dense disparity map.
7. The method for fusing example segmentation and self-calibration of variable focus binocular vision as defined in claim 6, wherein the preprocessing is to adjust the left and right images to the same multiple, convert the images to RGB and normalize.
8. The method of fusion of instance segmentation and self-calibration of variable focus binocular ranging of claim 6, wherein the dense depth map is calculated by: Step 4.5, utilizing the geometric relation of binocular imaging to map the high-precision parallax image Conversion to depth map The method comprises the following steps: Where f is the focal length, b is the baseline length, d is the parallax, and Z is the depth.
9. The variable-focus binocular range finding method based on the fusion of the instance segmentation and the self-calibration of the invention according to claim 1 is characterized in that binocular image topics to be subscribed and pseudo-color depth maps and instance depth superposition map topics to be released are set, the pseudo-color depth maps and the instance depth superposition maps are respectively converted into image messages and released to corresponding topics, the image messages comprise instance areas, types, confidence and depth information, then left and right image caches are cleared, and next frame processing is waited.
10. The variable-focus binocular vision system integrating the instance segmentation and the self-calibration is characterized by comprising a variable-focus binocular camera, a camera self-calibration module, an instance segmentation module, a depth estimation module and a visualization module; The variable-focus double-target camera is used for collecting left and right synchronous frame images; The camera self-calibration module is used for self-calibrating by utilizing images acquired by the binocular camera and a self-calibration method based on deep learning to obtain internal and external parameters of the camera; the instance segmentation module is used for acquiring a mask area, a category label and a confidence coefficient of each detection instance in the image; The depth estimation module is used for preprocessing left and right images and inputting the preprocessed left and right images into the stereo matching network to obtain a dense parallax image, and then adjusting the dense parallax image to the original image size; extracting depth values corresponding to the pixels of the mask region from the dense depth map, and overlapping the depth median value of the region serving as a distance estimation value of the instance in an original image acquired by a camera to obtain an instance depth overlapping map; And the visualization module is used for respectively converting the pseudo-color depth map and the instance depth overlay map into image messages and publishing the image messages.

Description

Variable-focus binocular vision distance method and system integrating instance segmentation and self-calibration Technical Field The invention belongs to the technical field of computer vision and three-dimensional perception, and particularly relates to a variable-focus binocular vision method and system integrating example segmentation and self-calibration. Background The binocular vision ranging technology has wide application in the fields of robot navigation, three-dimensional reconstruction, intelligent monitoring, augmented reality and the like because of the advantages of no need of actively transmitting signals, high acquisition precision, low cost and the like. Conventional binocular ranging methods generally rely on fixed focal length binocular cameras, and acquire internal parameters (e.g., focal length, principal point position, distortion coefficients) and external parameters (e.g., roll angle, pitch angle) through manual off-line calibration, and then complete depth restoration through parallax computation. Such methods can achieve relatively stable results in static scenes or experimental environments, but face many challenges in practical complex environments. Firstly, the traditional method is highly dependent on the manual calibration process, and the calibration precision directly determines the reliability of the depth calculation result. However, in a dynamic operation scene, the camera may cause internal parameter change due to external force collision, zoom adjustment or temperature change, the original calibration result is invalid, and recalibration is needed, so that the flexibility and practicability of the system are greatly reduced. Especially in the binocular system with zooming capability, the focal length is used as a key internal reference to continuously change along with zooming, and the calibration parameters are difficult to update in real time by the traditional method, so that the ranging accuracy is greatly reduced. Secondly, most of the conventional ranging systems only pay attention to dense depth map acquisition at the pixel level, lack of recognition and segmentation functions for specific objects in an image, and are difficult to meet the semantic perception requirement of 'which object is at how far' in practice. Although some systems attempt to acquire object position information by combining with a target detection model, generally, the approximate position of an object can be identified only based on bounding box identification, pixel-level region extraction is difficult to realize, and the distinguishing capability of a plurality of similar targets is insufficient, so that the application of the system in control and interaction tasks is limited. In addition, the existing system generally treats image acquisition, parameter estimation, target identification and depth calculation as independent modules, lacks a unified software architecture, and is not beneficial to real-time operation and embedded deployment of the system. In summary, the existing binocular ranging technology still has significant shortcomings in terms of dynamic focal length adaptation, autonomous calibration capability, instance-level semantic perception and system integration level. Therefore, it is needed to propose a novel ranging method and system, which can realize automatic calibration without manual participation, complete independent recognition and accurate ranging of multiple targets by combining instance segmentation, and simultaneously have good real-time performance and system integration level so as to meet the actual requirements of an intelligent sensing system in a complex environment. Disclosure of Invention In view of the above, the invention provides a variable-focus binocular vision distance measuring method and system integrating example segmentation and self-calibration, a binocular camera can focus and adapt to different distances for distance measurement, the acquired images are utilized for real-time self-calibration, and the example segmentation is carried out on the images, so that the aim of displaying the image information of each object example is fulfilled. The technical scheme for realizing the invention is as follows: in a first aspect, the invention provides a variable focus binocular vision distance method integrating example segmentation and self-calibration, which comprises the following specific processes: The camera self-calibration, namely, self-calibrating is carried out by utilizing images acquired by a binocular camera based on a self-calibration method of deep learning to obtain internal and external parameters of the camera; example segmentation, namely acquiring a mask area, a category label and confidence coefficient of each detection example in an image; The depth estimation comprises the steps of preprocessing left and right images acquired by a binocular camera, inputting the preprocessed left and right images into a stereo matching netw