CN-115861537-B - Three-dimensional reconstruction method, system and medium based on contrast learning and density comparison

CN115861537BCN 115861537 BCN115861537 BCN 115861537BCN-115861537-B

Abstract

The invention discloses a three-dimensional reconstruction method, a system and a medium based on contrast learning and density comparison, wherein the method comprises the steps of collecting two-dimensional inquiry pictures of objects and corresponding three-dimensional support shapes to form an original data set; the method comprises the steps of constructing a three-dimensional reconstruction model, pre-training the contrast learning module, fixing parameters of the contrast learning module after the contrast learning module is completed, iteratively training the three-dimensional reconstruction model on an original data set by using a gradient descent method to obtain a trained three-dimensional reconstruction model, and inputting a query object picture into the trained three-dimensional reconstruction model to obtain a predicted three-dimensional shape. The three-dimensional reconstruction model can well integrate the features extracted from the two-dimensional picture and the features extracted from the three-dimensional shape, can extract the features closely related to the three-dimensional information from the image, and further realizes the three-dimensional reconstruction effect with higher precision.

Inventors

WU QINGYAO
CHEN JIAN
Lai Lvlong

Assignees

华南理工大学

Dates

Publication Date: 20260505
Application Date: 20221207

Claims (10)

1. The three-dimensional reconstruction method based on contrast learning and density comparison is characterized by comprising the following steps of: Acquiring a two-dimensional query picture of an object and a corresponding three-dimensional supporting shape to form a primary data set; constructing a three-dimensional reconstruction model, wherein the three-dimensional reconstruction model comprises a contrast learning module and a feature fusion module; pretraining the contrast learning module, fixing parameters of the contrast learning module after completion, comprising: A comparison learning sample set is constructed on the original data set, a two-dimensional encoder and a three-dimensional encoder are used for respectively encoding samples in the comparison learning sample set, and two-dimensional feature vectors and three-dimensional feature vectors of the samples are obtained and projected to the same embedded space; Constructing a loss function of the contrast learning module, calculating a loss value, reversely updating parameters of the contrast learning module until convergence, and fixing parameters of a two-dimensional encoder and a three-dimensional encoder in the contrast learning module to obtain a pre-trained contrast learning module; Iteratively training a three-dimensional reconstruction model on the original dataset using a gradient descent method, comprising: inputting the original data set into a pre-trained comparison learning model with fixed parameters to obtain a two-dimensional characteristic vector and a three-dimensional characteristic vector of the object; according to the two-dimensional feature vector and the three-dimensional feature vector of the object, the feature fusion module performs Cross-Attention operation to generate Q, K, V matrix, re-represents the two-dimensional feature and the three-dimensional feature of the object by using the Attention operation, and performs feature fusion by using a dense comparison method to obtain fusion features of the object; Constructing a binary cross entropy loss function of the three-dimensional reconstruction model, calculating a binary cross entropy loss value and updating parameters of the three-dimensional reconstruction model until the binary cross entropy loss function converges, and obtaining a trained three-dimensional reconstruction model; and inputting the query object picture into the trained three-dimensional reconstruction model to obtain a predicted three-dimensional shape.
2. The contrast learning and dense comparison based three-dimensional reconstruction method according to claim 1, wherein the raw dataset is represented as: wherein k represents the object category, I ik is the two-dimensional query picture of the ith object of the k-th object in the original dataset, S ik is the three-dimensional support shape of the ith object of the k-th object in the original dataset, N k is the number of the k-th objects in the original dataset, n= The total number of objects in the original dataset; the two-dimensional inquiry pictures are acquired by using high-resolution shooting equipment, and the shooting visual angles are uniform during acquisition.
3. The three-dimensional reconstruction method based on contrast learning and density comparison according to claim 2, wherein a contrast learning sample set is constructed on the original data set, samples in the contrast learning sample set are encoded by using a two-dimensional encoder and a three-dimensional encoder, two-dimensional feature vectors and three-dimensional feature vectors of the samples are obtained and projected to the same embedding space, specifically: a comparison learning sample set is constructed on the original data set, wherein N c categories are randomly selected from the categories of the original data set, a pair of two-dimensional inquiry pictures and corresponding three-dimensional support shapes are randomly selected from each category to form the comparison learning sample set, and the comparison learning sample set is expressed as: Wherein, the A two-dimensional query picture representing an i-th sample in the comparison learning sample set, Representing the three-dimensional support shape of the i-th sample in the comparison learning sample set; inputting the two-dimensional query pictures of the samples in the comparison learning sample set into a two-dimensional encoder for encoding to obtain two-dimensional feature vectors of the samples, wherein the two-dimensional feature vectors are expressed as follows: Wherein E 2d denotes a two-dimensional encoder, A two-dimensional feature vector representing an i-th sample; Inputting the three-dimensional support shape of the sample in the comparison learning sample set into a three-dimensional encoder for encoding to obtain a three-dimensional feature vector of the sample, wherein the three-dimensional feature vector is expressed as: Wherein E 3d denotes a three-dimensional encoder, Representing a three-dimensional feature vector of the i-th sample; Projecting the two-dimensional feature vector and the three-dimensional feature vector of the sample into the same embedded space to obtain two-dimensional embedded features and three-dimensional embedded features of the sample, wherein the two-dimensional embedded features and the three-dimensional embedded features are expressed as follows: Wherein, the Representing the embedded features in two dimensions, Representing a three-dimensional embedded feature, P 2d 、P 3d is a single layer fully connected network.
4. A three-dimensional reconstruction method based on contrast learning and density comparison according to claim 3, wherein the loss function of the contrast learning module is expressed as: Wherein, the Wherein, the Loss of two-dimensional query picture for the i-th sample, For the three-dimensional support shape loss of the ith sample, sim is the cosine distance and τ is the temperature coefficient.
5. A three-dimensional reconstruction method based on contrast learning and density comparison according to claim 3, wherein the re-representation of two-dimensional features and three-dimensional features of the object is specifically: Performing Cross-Attention operation on the two-dimensional feature vector and the three-dimensional feature vector of the object to generate Q, K, V matrix: For a two-dimensional feature vector: for three-dimensional feature vectors: Wherein W Q ,W K ,W V ∈R d×d are trainable parameters, the two-dimensional feature vector and the three-dimensional feature vector of the object are r 3 ×d,r 3 , and d is the feature resolution; Re-representing the two-dimensional features and the three-dimensional features of the object by using the Attention operation, wherein the two-dimensional features of the object are as follows: the three-dimensional characteristics for the object are: Wherein, the In order to re-represent the two-dimensional features after that, To re-represent the three-dimensional feature, d is the feature dimension and FC represents the fully connected layer.
6. The three-dimensional reconstruction method based on contrast learning and density comparison according to claim 5, wherein the feature fusion using the density comparison method obtains fusion features of an object, specifically: To re-represent the three-dimensional shape feature The shape of (2) is deformed to 1×r 3 ×d; After deformation Through a layer of full-connection layer: outputting a global descriptor with a shape of 1 x d; Repeatedly outputting the descriptor r 3 times to obtain F ' ' ' S with the shape of r 3 multiplied by d; F ' ' ' S is combined with the two-dimensional feature after re-representation Splicing to obtain splicing characteristics F qS , wherein the shape is r 3 multiplied by 2d; The splice feature F qS is input into the three-dimensional convolutional layer Conv3D (2D, 3), and the fusion feature F' " q is output in the shape of r 3 ×d.
7. The contrast learning and density comparison based three-dimensional reconstruction method according to claim 6, wherein the binary cross entropy loss function of the three-dimensional reconstruction model is expressed as: Where n is the number of pixels in the fusion feature, y i e {0,1} represents the actual occupancy of the ith pixel in the fusion feature, 1 represents that the pixel is occupied, 0 represents that the pixel is empty, and p i represents the probability that the output ith pixel is occupied.
8. The three-dimensional reconstruction method based on contrast learning and density comparison according to claim 7, wherein in the contrast learning module pre-training process, the temperature coefficient τ is set to 0.1, the iteration is 20epochs, and the learning rate is 0.001; In the process of iterative training of the three-dimensional reconstruction model, the iteration times are set to 100000 times, the batch size is 32, the learning rate is 0.0001, the learning rate is halved for every 20000 times of iteration, and when p i is set to be more than 0.3, the ith pixel is predicted to be occupied by an object.
9. The three-dimensional reconstruction system based on contrast learning and density comparison is characterized by being applied to the three-dimensional reconstruction method based on contrast learning and density comparison as claimed in any one of claims 1-8, and comprising a data acquisition module, a model construction module, a pre-training module, an iterative training module and a result prediction module; The data acquisition module is used for acquiring two-dimensional inquiry pictures of objects and corresponding three-dimensional support shapes to form a primary data set; the model construction module is used for constructing a three-dimensional reconstruction model and comprises a contrast learning module and a feature fusion module; The pre-training module is used for pre-training the contrast learning module, fixing parameters of the contrast learning module after completion, and comprises the following steps: A comparison learning sample set is constructed on the original data set, a two-dimensional encoder and a three-dimensional encoder are used for respectively encoding samples in the comparison learning sample set, and two-dimensional feature vectors and three-dimensional feature vectors of the samples are obtained and projected to the same embedded space; Constructing a loss function of the contrast learning module, calculating a loss value, reversely updating parameters of the contrast learning module until convergence, and fixing parameters of a two-dimensional encoder and a three-dimensional encoder in the contrast learning module to obtain a pre-trained contrast learning module; the iterative training module is used for iteratively training a three-dimensional reconstruction model on the original data set by using a gradient descent method, and comprises the following steps: inputting the original data set into a pre-trained comparison learning model with fixed parameters to obtain a two-dimensional characteristic vector and a three-dimensional characteristic vector of the object; according to the two-dimensional feature vector and the three-dimensional feature vector of the object, the feature fusion module performs Cross-Attention operation to generate Q, K, V matrix, re-represents the two-dimensional feature and the three-dimensional feature of the object by using the Attention operation, and performs feature fusion by using a dense comparison method to obtain fusion features of the object; Constructing a binary cross entropy loss function of the three-dimensional reconstruction model, calculating a binary cross entropy loss value and updating parameters of the three-dimensional reconstruction model until the binary cross entropy loss function converges, and obtaining a trained three-dimensional reconstruction model; The result prediction module is used for inputting the query object picture into the trained three-dimensional reconstruction model to obtain a predicted three-dimensional shape.
10. A computer readable storage medium storing a program, wherein the program, when executed by a processor, implements the contrast learning and density comparison based three-dimensional reconstruction method of any one of claims 1-8.

Description

Three-dimensional reconstruction method, system and medium based on contrast learning and density comparison Technical Field The invention belongs to the technical field of three-dimensional reconstruction, and particularly relates to a three-dimensional reconstruction method, a system and a medium based on contrast learning and density comparison. Background Three-dimensional reconstruction is one of the important subjects in the field of computer vision, and a 3D model can be obtained from a picture, so that the 3D position/coordinate corresponding to each point in the picture can be known. The three-dimensional reconstruction method can be used for a plurality of fields such as computer-aided geometric design, computer graphics, computer animation, computer vision, medicine, virtual reality, digital media and the like. Most of the existing three-dimensional reconstruction techniques need multi-view pictures, complex camera equipment and places, and have high cost, while the existing single-view three-dimensional reconstruction techniques cannot achieve a good effect, or need a large number of training data sets, have poor generalization capability, and have weak reconstruction capability for new types of objects. Therefore, many existing three-dimensional reconstruction methods cannot achieve good application standards, and the problems of improving the accuracy of three-dimensional reconstruction, generalizing the capability, reducing the cost of technology and the like are needed to be solved. Disclosure of Invention The invention aims to overcome the defects and shortcomings of the prior art, provides a three-dimensional reconstruction method, a system and a medium based on contrast learning and density comparison, and aims to pretrain a contrast learning module, fix parameters of the contrast learning module, extract features in a two-dimensional picture and a three-dimensional shape, fuse the extracted features by using a feature fusion module and output a three-dimensional reconstruction result by constructing the three-dimensional reconstruction model. In order to achieve the above purpose, the present invention adopts the following technical scheme: in one aspect, the invention provides a three-dimensional reconstruction method based on contrast learning and density comparison, comprising the following steps: Acquiring a two-dimensional query picture of an object and a corresponding three-dimensional supporting shape to form a primary data set; constructing a three-dimensional reconstruction model, wherein the three-dimensional reconstruction model comprises a contrast learning module and a feature fusion module; pretraining the contrast learning module, fixing parameters of the contrast learning module after completion, comprising: A comparison learning sample set is constructed on the original data set, a two-dimensional encoder and a three-dimensional encoder are used for respectively encoding samples in the comparison learning sample set, and two-dimensional feature vectors and three-dimensional feature vectors of the samples are obtained and projected to the same embedded space; Constructing a loss function of the contrast learning module, calculating a loss value, reversely updating parameters of the contrast learning module until convergence, and fixing parameters of a two-dimensional encoder and a three-dimensional encoder in the contrast learning module to obtain a pre-trained contrast learning module; Iteratively training a three-dimensional reconstruction model on the original dataset using a gradient descent method, comprising: inputting the original data set into a pre-trained comparison learning model with fixed parameters to obtain a two-dimensional characteristic vector and a three-dimensional characteristic vector of the object; according to the two-dimensional feature vector and the three-dimensional feature vector of the object, the feature fusion module performs Cross-Attention operation to generate Q, K, V matrix, re-represents the two-dimensional feature and the three-dimensional feature of the object by using the Attention operation, and performs feature fusion by using a dense comparison method to obtain fusion features of the object; Constructing a binary cross entropy loss function of the three-dimensional reconstruction model, calculating a binary cross entropy loss value and updating parameters of the three-dimensional reconstruction model until the binary cross entropy loss function converges, and obtaining a trained three-dimensional reconstruction model; and inputting the query object picture into the trained three-dimensional reconstruction model to obtain a predicted three-dimensional shape. As a preferred embodiment, the raw data set is expressed as: wherein k represents the object category, I ik is the two-dimensional query picture of the ith object of the k-th object in the original dataset, S ik is the three-dimensional support shape of the ith object of the k