CN-115552469-B - Method for estimating pose diagrams and transformation matrices between cameras by identifying markers on the ground in panoramic images

CN115552469BCN 115552469 BCN115552469 BCN 115552469BCN-115552469-B

Abstract

The application provides a method, a computer program and a computer system for reconstructing a 3D gesture graph. The 3D reconstruction of indoor buildings for VR/AR applications (e.g., virtual tourism, digital museums, and virtual house sales) may be performed based on estimating a pose diagram and transformation matrix between cameras by identifying markers on the ground in the panoramic image. Image data corresponding to one or more views of a first camera is received. One or more markers corresponding to the second camera are identified in the received image data. Based on the identified one or more markers, a gesture map corresponding to the one or more views of the first camera is constructed, the gesture map including at least edges.

Inventors

ZHANG XIANG
JIAN BING
HE LU
ZHU HAICHAO
LIU SHAN
LIU KELIN
FENG WEIWEI

Assignees

腾讯美国有限责任公司

Dates

Publication Date: 20260512
Application Date: 20211013
Priority Date: 20211008

Claims (14)

1. A method of constructing a pose map for a camera, performed in a processor, comprising: Receiving image data corresponding to one or more views of a first camera; identifying one or more markers corresponding to a second camera within the received image data; Determining that the second camera is visible from the first camera and the first camera is visible from the second camera in response to the one or more markers corresponding to the second camera in the image data corresponding to the one or more views of the first camera; In response to the second camera being visible from the first camera and the first camera being visible from the second camera, calculating a transformation matrix between the first camera and the second camera based on the identified coordinates of each of the markers in the image data, wherein the transformation matrix comprises a rotation matrix between the first camera and the second camera, and a translation vector between the first camera and the second camera, the rotation matrix being obtained by the following formula: ; Wherein the said For the transformation matrix, the Is a yaw angle of a three-dimensional coordinate system associated with the first camera, and the Obtained from the following formula: ; Wherein the said As a predefined function, said For the relative ordinate of the second camera relative to the first camera in the three-dimensional coordinate system, the For the relative abscissa of the second camera relative to the first camera in the three-dimensional coordinate system, the For the relative ordinate of the first camera relative to the second camera in the three-dimensional coordinate system, the Is the relative abscissa of the first camera relative to the second camera in the three-dimensional coordinate system; Based on the movement relationship between the first camera and the second camera indicated by the transformation matrix, a gesture map corresponding to the one or more views of the first camera is constructed, the gesture map including at least edges.
2. The method of claim 1, wherein a node of the gesture graph corresponds to each of the one or more views.
3. The method of claim 1, wherein the edge of the gesture graph is constructed based on the one or more identified markers.
4. The method of claim 1, wherein the edge is constructed between the first camera and the second camera in the one or more views based on a marker of the one or more markers corresponding to the second camera present in the one or more views corresponding to the first camera.
5. The method of any of claims 1-4, wherein a transformation matrix between a first camera and a second camera is estimated from the one or more views based on a transformation associated with a marker of the one or more markers corresponding to the second camera in the determined one or more views corresponding to the first camera.
6. The method of claim 5, wherein the transformation matrix is associated with a camera movement between the first camera and the second camera.
7. A computer system for constructing a camera pose map, the computer system comprising: one or more computer-readable non-transitory storage media configured to store computer program code; One or more computer processors configured to access the computer program code and operate as directed by the computer program code, the computer program code comprising: receive code configured to cause the one or more computer processors to receive image data for one or more views corresponding to a first camera; An identification code configured to cause the one or more computer processors to identify one or more markers within the received image data corresponding to a second camera; Constructing code configured to cause the one or more computer processors to determine that the second camera is visible from the first camera and the first camera is visible from the second camera in response to the one or more markers corresponding to the second camera in the image data corresponding to the one or more views of the first camera, and to calculate a transformation matrix between the first camera and the second camera based on the identified coordinates of each of the markers in the image data, wherein the transformation matrix comprises a rotation matrix between the first camera and the second camera, and a translation vector between the first camera and the second camera, the rotation matrix being obtained by: ; Wherein the said For the transformation matrix, the Is a yaw angle of a three-dimensional coordinate system associated with the first camera, and the Obtained from the following formula: ; Wherein the said As a predefined function, said For the relative ordinate of the second camera relative to the first camera in the three-dimensional coordinate system, the For the relative abscissa of the second camera relative to the first camera in the three-dimensional coordinate system, the For the relative ordinate of the first camera relative to the second camera in the three-dimensional coordinate system, the And constructing a gesture map of the one or more views corresponding to the first camera based on the movement relationship between the first camera and the second camera indicated by the transformation matrix, wherein the gesture map at least comprises edges.
8. The computer system of claim 7, further comprising first determination code configured to cause the one or more computer processors to determine visibility between the locations of the first camera and the second camera, wherein the second camera is determined to be visible from the first camera in response to a marker of the one or more markers corresponding to the second camera being present in the one or more views corresponding to the first camera.
9. The computer system of claim 7, wherein a node of the gesture graph corresponds to each of the one or more views.
10. The computer system of claim 7, wherein the edge of the gesture graph is constructed based on the one or more identified markers.
11. The computer system of claim 7, wherein the edge is constructed between the first camera and the second camera in the one or more views based on a marker of the one or more markers corresponding to the second camera present in the one or more views corresponding to the first camera.
12. The computer system of any of claims 7-11, further comprising estimation code configured to cause the one or more computer processors to estimate a transformation matrix between a first camera and a second camera from the one or more views based on the determined transformation associated with a marker of the one or more markers corresponding to the second camera in the one or more views corresponding to the first camera.
13. The computer system of claim 12, wherein the transformation matrix is associated with camera movements between the first camera and the second camera.
14. A non-transitory computer readable medium storing a computer program for constructing a pose graph of a camera, the computer program configured to cause one or more computer processors to implement the method of any of claims 1-6.

Description

Method for estimating pose diagrams and transformation matrices between cameras by identifying markers on the ground in panoramic images Cross Reference to Related Applications The present application claims priority from U.S. provisional patent application No. 63/185,945 (filed 5/7/2021) and U.S. patent application No. 17/497,025 (filed 10/2021), which are incorporated herein by reference in their entireties. Technical Field The present disclosure relates generally to the field of data processing, and more particularly to image processing. Background The 3D reconstruction of indoor buildings is an active research topic and has been used in various industries such as real estate, building construction, building repair, entertainment and the like. The 3D reconstruction utilizes computer vision and machine learning techniques to generate a 3D geometric representation of a building in a scene by taking as input a single RGB image or a set of images from different perspectives. The development of depth sensors makes it more convenient and accurate to directly measure depth information from a scene, and currently widely used depth cameras include laser radars, structured lights, and the like. The 3D geometrical representation is typically in the form of a so-called point cloud, which contains a set of 3D points in space, each point containing 3D position information and additional properties like color and reflectivity. Another popular 3D format is a texture grid. In addition to a single 3D point, it contains connectivity information between adjacent points, forming a set of facets (e.g., triangles), and texture information may be attached to each facet. To capture a large scene, multiple images are acquired from different viewpoints. In this case, the gesture map is important. The gesture graph defines connectivity and visibility between different viewpoints. Disclosure of Invention Embodiments relate to a method, system, and computer-readable medium for constructing a camera pose map. According to one aspect, a method for gesture graph construction is provided. The method may include receiving image data for one or more views corresponding to a first camera. One or more markers corresponding to the second camera are identified in the received image data. Based on the identified one or more markers, a gesture map corresponding to the one or more views corresponding to the first camera is constructed, the gesture map including at least edges. According to another aspect, a computer system for gesture graph construction is provided. The computer system may include one or more computer-readable non-transitory storage media configured to store computer program code, and one or more computer processors configured to access the computer program code and operate as indicated by the computer program code, the computer program code including receiving code configured to cause the one or more computer processors to receive image data of one or more views corresponding to a first camera, identifying code configured to cause the one or more computer processors to identify one or more markers corresponding to a second camera within the received image data, and constructing code configured to cause the one or more computer processors to construct a gesture graph of the one or more views corresponding to the first camera based on the identified one or more markers, the gesture graph including at least edges. According to yet another aspect, a non-transitory computer readable medium is provided, storing a computer program for constructing a camera pose diagram, the computer program being configured to cause one or more computer processors to implement the above-described method for constructing a camera pose diagram. Drawings These and other objects, features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings. The various features of the drawings are not to scale as the illustrations are for clarity of understanding by those skilled in the art in conjunction with the detailed description. In the drawings: FIG. 1 illustrates a networked computer environment, according to at least one embodiment; FIG. 2 is a set of markers for pose graph reconstruction according to at least one embodiment; FIG. 3 is an operational flow diagram showing steps performed by a program for gesture map reconstruction in accordance with at least one embodiment; FIG. 4 is a block diagram of the internal and external components of the server and computer depicted in FIG. 1, in accordance with at least one embodiment; FIG. 5 is a block diagram of an illustrative cloud computing environment including the computer system depicted in FIG. 1, in accordance with at least one embodiment, and FIG. 6 is a block diagram of functional layers of the illustrative cloud computing environment of FIG. 5 in accordance with at least one embodiment.