CN-121999139-A - Three-dimensional reconstruction method, apparatus, computer, readable storage medium, and program product

CN121999139ACN 121999139 ACN121999139 ACN 121999139ACN-121999139-A

Abstract

The embodiment of the application discloses a three-dimensional reconstruction method, a device, a computer, a readable storage medium and a program product, wherein the method comprises the steps of projecting N identification codes on a display screen, and acquiring M image frames acquired by picture acquisition equipment on the display screen; N, M are positive integers, obtain locating feature points corresponding to M image frames respectively, wherein the locating feature points in the M image frames at least comprise feature points corresponding to corner points in N identification codes, obtain at least two initial image frames from the M image frames, construct initial three-dimensional point clouds of the display screen based on the corresponding relation between the locating feature points respectively comprised by the at least two initial image frames, and reconstruct the locating feature points of the non-reconstructed image frames in the M image frames in a three-dimensional manner on the basis of the initial three-dimensional point clouds, so as to obtain target three-dimensional point clouds of the display screen. By adopting the application, the accuracy of three-dimensional reconstruction of the display screen can be improved.

Inventors

HE YUANJIAN
CHEN FASHENG
ZHANG CHEN
LI ZHI

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260508
Application Date: 20260212

Claims (19)

1. A method of three-dimensional reconstruction, the method comprising: Projecting N identification codes on a display screen, and acquiring M image frames acquired by a picture acquisition device on the display screen, wherein N, M is a positive integer; the positioning characteristic points in the M image frames at least comprise characteristic points corresponding to corner points in the N identification codes; Acquiring at least two initial image frames from the M image frames, and constructing an initial three-dimensional point cloud of the display screen based on the corresponding relation between the positioning feature points respectively included in the at least two initial image frames; And carrying out three-dimensional reconstruction on the positioning feature points of the unreconstructed image frames in the M image frames frame by frame on the basis of the initial three-dimensional point cloud to obtain the target three-dimensional point cloud of the display screen.
2. The method according to claim 1, wherein the acquiring positioning feature points corresponding to the M image frames respectively includes: respectively identifying the M image frames by using identification codes to obtain code features respectively included in the M image frames; analyzing the code features respectively included in the M image frames to obtain identification codes indicated by the code features in each image frame and corner points of the identification codes of each image frame; And determining the characteristic points of the corner points of the identification codes corresponding to the M image frames as positioning characteristic points corresponding to the M image frames.
3. The method of claim 1, wherein the locating feature points in the M image frames further comprise natural feature points in the M image frames; The obtaining positioning feature points corresponding to the M image frames respectively includes: identifying identification codes respectively included in the M image frames, and identifying shielding areas aiming at the N identification codes in the M image frames; Detecting pixel difference points of the first type image frames to which the shielding areas belong to obtain natural feature points in the first type image frames, wherein the pixel difference points are pixel points with differences with adjacent pixel points; And determining characteristic points of corner points of identification codes included in the first type of image frames and the natural characteristic points as positioning characteristic points corresponding to the first type of image frames, and determining characteristic points corresponding to corner points of identification codes included in a second type of image frames as positioning characteristic points corresponding to the second type of image frames, wherein the second type of image frames are image frames except the first type of image frames in the M image frames.
4. A method according to claim 3, wherein the detecting the pixel difference point of the first type of image frame to which the occlusion region belongs to obtain the natural feature point in the first type of image frame includes: Detecting pixel difference points of a first type of image frames to which the shielding areas belong to obtain initial feature points in the first type of image frames; And determining the initial characteristic points as natural characteristic points in the first-class image frames, or carrying out quality evaluation on the initial characteristic points, and screening the initial characteristic points based on a quality evaluation result to obtain the natural characteristic points in the first-class image frames.
5. The method of claim 1, wherein the acquiring at least two initial image frames from the M image frames comprises: dividing the M image frames into at least two image groups, wherein each image group comprises at least two mutually different image frames in the M image frames; the method comprises the steps of obtaining the public quantity of public feature point groups included in at least two image frames in each image group, and obtaining first visual angle differences of the at least two image frames in each image group, wherein the public feature point groups in any one image group comprise public feature points respectively corresponding to the at least two image frames in the image group, and the public feature points respectively corresponding to the at least two image frames are all corresponding to the same screen point in the display screen; acquiring distribution data of a common feature point group in each image group in at least two image frames of the image group; And performing view screening on the at least two image groups according to the public quantity, the first view angle difference and the distribution data which correspond to the at least two image groups respectively to obtain a target image group, and determining at least two image frames in the target image group as at least two initial image frames.
6. The method according to claim 1, wherein constructing the initial three-dimensional point cloud of the display screen based on correspondence between the positioning feature points included in the at least two initial image frames, respectively, includes: Acquiring an initial public feature point set included in the at least two initial image frames, and constructing a basic matrix based on the initial public feature point set, wherein the initial public feature point set comprises at least two public feature points corresponding to the same position in the display screen; Decomposing the equipment internal parameters of the picture acquisition equipment and the basic matrix to obtain a view relative pose between the at least two initial image frames; Determining a coordinate system of the picture acquisition equipment at a reference image frame in the at least two initial image frames as a multidimensional coordinate system, and determining the view relative pose as a frame view pose corresponding to a starting view frame in the at least two initial image frames; And in the multidimensional coordinate system, carrying out projection analysis on the initial common characteristic point set by adopting the frame view pose to obtain an initial three-dimensional point cloud consisting of three-dimensional coordinates of initial screen points corresponding to the initial common characteristic point set.
7. The method of claim 6, wherein the constructing a basis matrix based on the initial set of common feature points comprises: Randomly sampling the initial public feature point group to obtain an ith first public feature point group when an ith matrix is constructed, and performing model fitting on the ith first public feature point group to obtain an ith fitting matrix, wherein i is a positive integer; acquiring a matrix error from the initial public feature point group to the ith fitting matrix, and determining the initial public feature point group with the matrix error smaller than or equal to a matrix effective threshold value as an internal point corresponding to the ith fitting matrix; If the ith matrix construction does not meet the construction completion condition, carrying out the (i+1) th matrix construction to obtain an (i+1) th fitting matrix and an inner point corresponding to the (i+1) th fitting matrix; And if the ith matrix construction meets the construction completion condition, determining a fitting matrix with the largest number of inner points in at least two fitting matrices as a basic matrix.
8. The method of claim 6, wherein the initial set of common feature points comprises reference common feature points in the reference image frame and starting common feature points in the starting view frame; In the multi-dimensional coordinate system, the frame view pose is adopted to carry out projection analysis on the initial common feature point set to obtain an initial three-dimensional point cloud composed of three-dimensional coordinates of initial screen points corresponding to the initial common feature point set, and the method comprises the following steps: Acquiring a first plane coordinate of the reference common feature point in the reference image frame, and acquiring a second plane coordinate of the initial common feature point in the initial view frame; constructing a first projection model based on the first plane coordinates, and constructing a second projection model based on the second plane coordinates and the frame view pose; and carrying out joint analysis on the first projection model and the second projection model to obtain three-dimensional coordinates of initial screen points corresponding to the initial public feature point group in the display screen, and forming the three-dimensional coordinates of the initial screen points into an initial three-dimensional point cloud.
9. The method according to claim 1, wherein the performing three-dimensional reconstruction on the positioning feature points of the unreformed image frames in the M image frames frame by frame based on the initial three-dimensional point cloud to obtain a target three-dimensional point cloud of the display screen includes: obtaining a kth increment image frame from the unreformed image frames of the M image frames, wherein k is a positive integer; Constructing a pose optimization model of the kth incremental image frame based on the positioning feature points in the kth incremental image frame and the initial three-dimensional point cloud, and analyzing the pose optimization model to obtain a kth frame view pose corresponding to the kth incremental image frame; Obtaining a kth incremental screen point which is not subjected to three-dimensional reconstruction from the reconstructed image frames in the M image frames and the kth incremental image frames, wherein the kth incremental screen point refers to a screen point projected to at least two image frames in the reconstructed image frames and the kth incremental image frames; Three-dimensional reconstruction is carried out on the kth increment screen point by adopting the kth frame view pose, so that three-dimensional coordinates corresponding to the kth increment screen point are obtained; Adding the three-dimensional coordinates corresponding to the kth increment screen point into the kth-1 intermediate point cloud to obtain a kth intermediate point cloud; If the M image frames have the unreconstructed image frames, acquiring a (k+1) th incremental image frame from the unreconstructed image frames of the M image frames, and carrying out three-dimensional reconstruction on the (k+1) th incremental image frame; And if the M image frames do not have the unrecreated image frames, determining the kth intermediate point cloud as the target three-dimensional point cloud of the display screen.
10. The method of claim 9, wherein the acquiring a kth delta image frame from among the non-reconstructed image frames of the M image frames comprises: detecting feature points of non-reconstructed image frames in the M image frames based on the k-1 intermediate point cloud to obtain the reconstructed number of reconstructed feature points in the non-reconstructed image frames, wherein the reconstructed feature points are positioning feature points with corresponding three-dimensional coordinates in the k-1 intermediate point cloud; acquiring a second view angle difference between the non-reconstructed image frame and a reconstructed image frame of the M image frames; And determining the non-reconstructed image frame with the second visual angle difference belonging to the theoretical parallax range and the maximum reconstructed number as the kth increment image frame.
11. The method according to claim 9, wherein the performing three-dimensional reconstruction on the positioning feature points of the non-reconstructed image frames in the M image frames frame by frame based on the initial three-dimensional point cloud to obtain a target three-dimensional point cloud of the display screen includes: on the basis of the initial three-dimensional point cloud, carrying out three-dimensional reconstruction on positioning feature points of unreconstructed image frames in the M image frames frame by frame to obtain a reconstructed three-dimensional point cloud of the display screen; Acquiring plane coordinates of positioning feature points in the M image frames, and constructing a three-dimensional reconstruction error for the reconstructed three-dimensional point cloud according to the plane coordinates, the reconstructed three-dimensional point cloud and frame view poses corresponding to the M image frames respectively; And constraining the reconstructed three-dimensional point cloud according to the three-dimensional reconstruction error to obtain a target three-dimensional point cloud of the display screen.
12. The method of claim 11, wherein the three-dimensional reconstruction errors include at least a re-projection error, and wherein the constructing the three-dimensional reconstruction errors for the reconstructed three-dimensional point cloud based on the planar coordinates, the reconstructed three-dimensional point cloud, and the frame view poses corresponding to the M image frames, respectively, includes: acquiring three-dimensional coordinates of jth positioning screen point in the reconstructed three-dimensional point cloud Acquiring plane coordinates of the jth positioning screen point in the kth image frame J is a positive integer, and k is a positive integer less than or equal to M; the positioning screen points refer to screen points corresponding to positioning feature points in the M image frames in the display screen; the three-dimensional coordinates are obtained by adopting the equipment internal parameters corresponding to the kth image frame and the frame view pose Projecting to obtain a projection coordinate A j of the jth positioning screen point; According to the projection coordinates A j and the plane coordinates Constructing a point projection error of the jth positioning screen point in the kth image frame And combining the point projection errors of the positioning feature points corresponding to the M image frames until the point projection errors of the positioning feature points corresponding to the M image frames are obtained, so as to obtain the re-projection errors.
13. The method of claim 11, wherein the three-dimensional reconstruction error comprises a size constraint error; the constructing a three-dimensional reconstruction error for the reconstructed three-dimensional point cloud according to the plane coordinates, the reconstructed three-dimensional point cloud and the frame view poses corresponding to the M image frames respectively comprises: b is a positive integer, each reconstruction side length is the distance between the three-dimensional coordinates of two reconstruction corner points indicated by the reconstruction side length, and the two reconstruction corner points indicated by each reconstruction side length belong to the same identification code; And performing size constraint on the B reconstruction side lengths based on the standard code side lengths to obtain the size constraint error.
14. The method of claim 11, wherein the three-dimensional reconstruction error comprises a plane constraint error; the constructing a three-dimensional reconstruction error for the reconstructed three-dimensional point cloud according to the plane coordinates, the reconstructed three-dimensional point cloud and the frame view poses corresponding to the M image frames respectively comprises: respectively constructing plane constraint parameters aiming at the N identification codes, adopting the plane constraint parameters respectively corresponding to the N identification codes, and constructing a plane model on three-dimensional coordinates of corner points respectively corresponding to the N identification codes in the reconstructed three-dimensional point cloud to obtain code plane constraint models respectively corresponding to the N identification codes; and combining the code plane constraint models corresponding to the N identification codes respectively to obtain the plane constraint error.
15. The method of claim 1, wherein the acquiring the M image frames acquired by the picture acquisition device for the display screen comprises: The method comprises the steps of acquiring a kth image frame acquired by a picture acquisition device for a display screen, and identifying the kth image frame by using an identification code to obtain a kth acquisition code set, wherein k is a positive integer, and the kth acquisition code set comprises the identification code acquired in the kth image frame; updating code acquisition times respectively corresponding to the N identification codes based on the kth acquisition code set; If the N identification codes have identification codes with the code acquisition times smaller than the reconstruction acquisition threshold value, acquiring the identification codes which are acquired by the picture acquisition equipment based on the code acquisition times smaller than the reconstruction acquisition threshold value, and aiming at the (k+1) th image frame acquired by the display screen; And if the updated code acquisition times corresponding to the N identification codes respectively are larger than or equal to the reconstruction acquisition threshold value, executing the process of acquiring the positioning feature points corresponding to the M image frames respectively.
16. A three-dimensional reconstruction apparatus, the apparatus comprising: The screen processing module is used for projecting N identification codes on a display screen and acquiring M image frames acquired by a picture acquisition device on the display screen, wherein N, M is a positive integer; the positioning processing module is used for acquiring positioning characteristic points corresponding to the M image frames respectively, wherein the positioning characteristic points in the M image frames at least comprise characteristic points corresponding to corner points in the N identification codes; the primary reconstruction module is used for acquiring at least two initial image frames from the M image frames, and constructing an initial three-dimensional point cloud of the display screen based on the corresponding relation between the positioning characteristic points respectively included in the at least two initial image frames; And the three-dimensional reconstruction module is used for carrying out three-dimensional reconstruction on the positioning feature points of the unreconstructed image frames in the M image frames frame by frame on the basis of the initial three-dimensional point cloud to obtain the target three-dimensional point cloud of the display screen.
17. A computer device, comprising a processor, a memory, and an input-output interface; The processor is connected to the memory and the input-output interface, respectively, wherein the input-output interface is used for receiving data and outputting data, the memory is used for storing a computer program, and the processor is used for calling the computer program to enable the computer device to execute the method of any one of claims 1-15.
18. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-15.
19. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any of claims 1-15.

Description

Three-dimensional reconstruction method, apparatus, computer, readable storage medium, and program product Technical Field The present application relates to the field of computer technologies, and in particular, to a three-dimensional reconstruction method, apparatus, computer, readable storage medium, and program product. Background In recent years, a large light emitting Diode (LIGHT EMITTING Diode, LED) screen has become one of the mainstream schemes of virtual shooting of video, and by displaying pictures output by a game engine or a real-time rendering engine on an LED screen with a length of several tens meters and a height of several meters and combining a picture acquisition device to perform real-time picture acquisition, pictures close to a final composite effect can be directly acquired on site, and green screen matting and post-composite cost are greatly reduced. In actual engineering, an LED screen used for virtual shooting is usually assembled from hundreds or even thousands of standard modules, and the total size can reach tens of meters×several meters, although a theoretical geometric model of the LED screen, such as a Computer-aided design (Computer-AIDED DESIGN, CAD) drawing or a screen model in three-dimensional software, is given in the design stage, the actual LED screen and the ideal model which are finally built are often subject to various factors such as steel structure errors, installation tolerances, module aging sagging, field leveling precision, and the like. The deviation can be represented as the dislocation of the virtual screen boundary and the real screen edge on the image plane, often reaches tens of pixels, and has obvious influence on high-precision applications such as infinite expansion of Extended Reality (XR), virtual-real combination edge transition and the like. Therefore, in order to use the geometric information of the LED screen in a virtual shooting system and the like, the LED screen is generally represented by adopting a geometric description based on a design model or artificial modeling at present, however, the method cannot reflect the field construction error of the LED screen and the local deformation generated in the later operation and maintenance process, and only can provide the approximate geometry of the LED screen, so that a deviation exists between the virtual LED screen (i.e. a theoretical screen) and the real LED screen (i.e. an actually constructed screen), and the application accuracy of the LED screen is reduced. Or an image is obtained by shooting an LED screen, natural characteristic points are identified from the image, the LED screen is reconstructed based on the natural characteristic points, so that three-dimensional point cloud corresponding to the LED screen is obtained, however, the surface display content of the LED screen is often a pattern or a dynamic picture with high contrast and strong periodicity, the natural characteristic points are highly repeated and are easy to confuse, meanwhile, strong high light and noise are brought to self-luminescence and reflection of the screen, the factors cause poor matching reliability and repeatability of the natural characteristic points, and therefore partial point cloud of the reconstructed LED screen is sparse and has high noise, and the reconstruction accuracy of the LED screen is lower. Disclosure of Invention The embodiment of the application provides a three-dimensional reconstruction method, a device, a computer, a readable storage medium and a program product, which can improve the accuracy of three-dimensional reconstruction of a display screen. In one aspect, the embodiment of the application provides a three-dimensional reconstruction method, which comprises the following steps: Projecting N identification codes on a display screen to obtain M image frames acquired by a picture acquisition device on the display screen, wherein N, M is a positive integer; The method comprises the steps of obtaining positioning characteristic points corresponding to M image frames respectively, wherein the positioning characteristic points in the M image frames at least comprise characteristic points corresponding to corner points in N identification codes; Acquiring at least two initial image frames from the M image frames, and constructing an initial three-dimensional point cloud of the display screen based on the corresponding relation between the positioning feature points respectively included in the at least two initial image frames; And carrying out three-dimensional reconstruction on the positioning feature points of the unreconstructed image frames in the M image frames frame by frame on the basis of the initial three-dimensional point cloud to obtain a target three-dimensional point cloud of the display screen. In one aspect, an embodiment of the present application provides a three-dimensional reconstruction apparatus, including: the screen processing module is used for projecting N id