Search

US-12626378-B2 - Digital image calculation method and system for RGB-D camera multi-view matching based on variable template

US12626378B2US 12626378 B2US12626378 B2US 12626378B2US-12626378-B2

Abstract

Disclosed is a digital image calculation method and system for RGB-D camera multi-view matching based on a variable template, the method includes six steps: acquiring data, preprocessing point cloud data, performing feature point matching, re-registering a variable template, calculating point cloud data transformation relationships among large-view images, and performing point cloud fusion. A size of a non-adjacent image matching template is adjusted based on registration results of adjacent angles of view, and correct registration of feature points of images from non-adjacent angles of view is accordingly achieved, which improves matching accuracy, eliminates cumulative errors in image sets, and provides more accurate initial values for subsequent iterations of point cloud fusion, such that the number of iterations is reduced, and three-dimensional reconstruction of images is implemented.

Inventors

  • Aiguo SONG
  • Huiran HU
  • Yuzhen XIE
  • Linhu WEI
  • Huijun Li
  • Lifeng Zhu

Assignees

  • SOUTHEAST UNIVERSITY

Dates

Publication Date
20260512
Application Date
20240428
Priority Date
20230625

Claims (9)

  1. 1 . A digital image calculation method for RGB-D camera multi-view matching based on a variable template, comprising the following steps: S1, acquiring data: image information data are acquired through an RGB-D camera, the image information data comprise color information and depth data, and three-dimensional information and color information of point clouds of a measured object are aligned by an RGB-D camera calibration algorithm or a method for aligning internal parameters and the depth data of an integrated depth camera of the RGB-D camera with the color information; S2, preprocessing point cloud data: pass-through filtering is performed on the three-dimensional information obtained in the S1 to filter out background regions exceeding a set range, and an optimal ground point cloud is determined and eliminated using a random sample consensus algorithm; and a cluster analysis is performed on processed point clouds, the point cloud data with similar densities is retained using a density-based clustering algorithm, and a region to be registered is obtained; S3, performing feature point matching: feature points of adjacent images in image sequences are tracked through a correlation calculation, and a relative pose transformation relationship of point cloud coordinates under each of angles of view is calculated through positions of the feature points; S4, re-registering the variable template: scale and direction of the variable template are adjusted according to calculation results of the relative pose transformation relationship of the point cloud coordinates obtained in the S3; S5, calculating point cloud data transformation relationships among large-view images: the feature points of images are registered according to the variable template obtained in the S4 to obtain a coordinate transformation relationship between corresponding points; and S6, performing a point cloud fusion: a pose transformation relationship between feature points matched in a set of the point cloud data is calculated, and is used as an initial value of an iterative closest point algorithm to perform iterative calculation, the point cloud fusion is then completed according to conditions for iteration termination, and three-dimensional reconstruction of the images is implemented.
  2. 2 . The digital image calculation method for RGB-D camera multi-view matching based on the variable template according to claim 1 , wherein specific calculation method of the three-dimensional information of the measured object in the S1 is: { X iw = d i · ( x i - c x ) / f x Y iw = d i · ( y i - y x ) / f y Z iw = d i wherein d i represents a distance of a target point from a plane of the camera; c x , c y , f x , f y represent the internal parameters of the depth camera; and X iw , Y iw , Z iw represent positions of a target point in a world coordinate system, respectively; the color information is acquired through a color information stream of the RGB-D camera, and a specific acquisition method is: Color=( R,G,B ) wherein R, G, and B represent red, green, and blue channel values in the RGB-D camera, respectively.
  3. 3 . The digital image calculation method for RGB-D camera multi-view matching based on the variable template according to claim 2 , wherein the pass-through filtering is performed on the three-dimensional information to filter out the background regions exceeding the set range in the S2 is specifically: { X min <= X iw <= X max Y min <= Y iw <= Y max Z min <= Z iw <= Z max wherein X min , X max , Y min , Y max , Z min , Z max represent detection thresholds in the world coordinate system, respectively; and in the random sample consensus algorithm, 3 points are randomly selected from filtered point clouds, and after this process is repeated, a fitted plane containing a largest number of points is the optimal ground point cloud.
  4. 4 . The digital image calculation method for RGB-D camera multi-view matching based on the variable template according to claim 3 , wherein the feature points of the adjacent images in an image sequence are tracked through the correlation calculation, and the relative pose transformation relationship of the point cloud coordinates under each of the angles of the view is calculated through the positions of the feature points in the S3, and a correlation calculation formula is as follows: C u , v = ∑ x , y [ r ⁡ ( x , y ) - r _ ] [ d ⁡ ( x + u , y + v ) - d _ ] ∑ x , y [ r ⁡ ( x , y ) - r _ ] 2 ⁢ ∑ x , y [ d ⁡ ( x + u , y + u ) - d _ ] 2 wherein r and d represent pixel gray-scale means of a reference subset and a deformed subset, respectively; u and v represent horizontal and vertical offsets of the feature points in a deformed image; r(x, y), d(x+u, y+v) represents pixel gray-scale values of the reference subset and the deformed subset in an image coordinate system of (x, y), (x+u, y+v).
  5. 5 . The digital image calculation method for RGB-D camera multi-view matching based on the variable template according to claim 3 , wherein the relative pose transformation relationship of the point cloud coordinates under each of the angles of the view is calculated through positions of the feature points in the S3, specifically, a rotation matrix R and a translation matrix T, and the coordinate transformation relationship between the corresponding points is: P 1 = [ R 2 1 T 2 1 0 1 ] ⁢ P 2 wherein P 1 , P 2 represent the positions of the feature points in the adjacent images before and after rotation and translation; and R 2 1 , T 2 1 represents the rotation matrix and the translation matrix of the feature points in the adjacent images after rotation and translation relative to the rotation matrix and the translation matrix of the feature points in a image before rotation and translation; and according to the rotation matrix R n n - 1 and the translation matrix T n n - 1 of coordinate transformation relationships among all of the adjacent images in the image sequences, a relative pose transformation of feature points of a k th image and an n th image can be calculated, which is converted into and expressed in an Euler angle as follows: [ θ x θ y θ z ] = Euler ( R k + 1 k · R k + 2 k + 1 ⁢ … ⁢ R n n - 1 ) wherein θ x , θ y and θ z represent scale parameters and direction parameters respectively, of wherein, the variable template.
  6. 6 . The digital image calculation method for RGB-D camera multi-view matching based on the variable template according to claim 5 , wherein scale and direction of the variable template are adjusted according to the calculation results of the relative pose transformation relationship of the point cloud coordinates in the S4, and a sampling formula is specifically: [ x r y r 1 ] = [ R · cos ⁢ θ R · sin ⁢ θ 1 ] wherein R represents a sampling radius, and θ represents a sampling angle; and when a shooting angle changes, a deformed image is rotated θ x θ y θ z around x, y and z axes in a space relative to a reference image, and the sampling of a deformation template is as follows: [ x d y d 1 ] = [ cos ⁢ θ z - sin ⁢ θ z 0 sin ⁢ θ z cos ⁢ θ z 0 0 0 1 ] [ R ′ · cos ⁢ θ · cos ⁢ θ y R ′ · sin ⁢ θ · cos ⁢ θ x 1 ] wherein the sampling radius and the sampling angle of the deformation template is set to be consistent with those of reference templates.
  7. 7 . The digital image calculation method for RGB-D camera multi-view matching based on the variable template according to claim 6 , wherein the coordinate transformation relationship between the corresponding points in the S5 is specifically: P k = [ R n k T n k 0 1 ] ⁢ P n wherein P k , P n represent positions of the feature points in the k th image and the n th image, and P k , P n represents a rotation matrix and a translation matrix of the feature points in the n th image relative to the feature points in the k th image.
  8. 8 . The digital image calculation method for RGB-D camera multi-view matching based on the variable template according to claim 7 , wherein a calculation formula of the iteration termination in the S6 is specifically: ΔLoss = 1 n ⁢ ∑ 1 n ❘ "\[LeftBracketingBar]" A current - B current ❘ "\[RightBracketingBar]" - 1 m ⁢ ∑ 1 m ❘ "\[LeftBracketingBar]" A last - B last ❘ "\[RightBracketingBar]" < error wherein A current , B current represent coordinates of a target point cloud and a changed point cloud of a current iteration, respectively; A last , B last represent a target point cloud and a changed point cloud of a last iteration, respectively; m and n represent a numbers of corresponding points of point clouds in the current iteration and the last iteration, respectively; and error represents a threshold for the iteration termination; and after iterative convergence, R n k and T n k are obtained, registration and fusion of point cloud data of the k th image and the n th image are implemented, and multi-view three-dimensional reconstruction is completed.
  9. 9 . A digital image calculation system for RGB-D camera multi-view matching based on a variable template, comprising a computer program, wherein when the computer program is executed by a processor, steps of any one of above methods are implemented.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation of international application of PCT application serial no. PCT/CN2023/105664 filed on Jul. 4, 2023, which claims the priority benefit of China application no. 202310745655.0 filed on Jun. 25, 2023. The entirety of each of the above-mentioned patent applications is hereby incorporated by reference herein and made a part of this specification. TECHNICAL FIELD The present disclosure belongs to the technical field of multi-view three-dimensional imaging and machine vision, and mainly relates to a digital image calculation method and system for RGB-D camera multi-view matching based on a variable template. BACKGROUND With the popularization of three-dimensional (3D) printing and virtual reality technology, there is an increasing demand for three-dimensional reconstruction techniques. Since the manual modeling using three-dimensional modeling software is extremely expensive, researchers have focused on three-dimensional reconstruction of an object by multi-view shooting of the object through a camera. Three-dimensional reconstruction methods based on an RGB-D camera can be roughly divided into two categories: point cloud-based method and image-based method. Although the point cloud-based reconstruction method has some good effects at present, such as the well-known ICP method and improvements thereof, it still has some limitations, particularly in processing scenes with disordered point clouds, in which case, an initial value close to a true value is required to avoid a local optimum solution, while the image-based reconstruction method relies on the matching of adjacent images. Generally, multi-view three-dimensional reconstruction involves the following steps: 1) sequences of multi-view images are captured through a camera, and feature points in the sequences are matched; and 2) a transformation relationship between images is calculated to align point cloud data of the image sequences. Therefore, the quality of three-dimensional reconstruction imposes higher requirements on the quality of feature point pairs and the accuracy of feature point matching. Some problems may arise in the process of digital image correlation matching of the image sequences. For example, excessive rotation or scaling between non-adjacent images will lead to an increase in false matched points, making direct matching impossible. Moreover, as the number of images from different angles of view increases, the cumulative matching errors between adjacent images will cause bifurcation after point cloud alignment. Actually, relevant scholars have put forward some solutions to these problems. For example, epipolar geometrical constraints are used to reduce mismatches in the process of binocular stereo matching. However, this solution cannot be applied to single-camera multi-view scenes because it needs to know about relative poses of angles of view of two cameras in advance. For two images incapable of being directly matched, some previous studies have suggested inserting a series of intermediate images to incrementally accumulate results of guided matching, which is effective in most cases, but in the multi-view matching process, introducing too many intermediate images will increase computational costs, and it is difficult not to cause cumulative errors. Some feature point matching methods, such as scale-invariant feature transform (SIFT) and its improvements, have also achieved good results. However, these methods still have limitations, because they heavily rely on the number of feature points. Insufficient feature point pairs in two images to be matched could make the matching impossible. Related studies have also shown that matching at too large angles could result in significant errors, therefore, it is necessary to limit the angles of view of adjacent images. However, in a scene needing a large angle of view, introducing lots of smaller angles of view will undoubtedly accumulate matching errors. SUMMARY In order to address the problems of accumulation of matching errors and mismatches in a single RGB-D camera multi-view matching in the prior art, the present disclosure provides a digital image calculation method and system for RGB-D camera multi-view matching based on a variable template, the method includes six steps: acquiring data, preprocessing point cloud data, performing feature point matching, re-registering a variable template, calculating point cloud data transformation relationships among large-view images, and performing point cloud fusion. A size of a non-adjacent image matching template is adjusted based on registration results of adjacent angles of view, and correct registration of feature points of images of non-adjacent angles of view is accordingly achieved, which improves matching accuracy, eliminates cumulative errors in image sets, and provides more accurate initial values for subsequent iterations of point cloud fusion, such that the number of