CN-121562717-B - Reinforced learning training method and system for space-time interaction dynamic three-dimensional reconstruction

CN121562717BCN 121562717 BCN121562717 BCN 121562717BCN-121562717-B

Abstract

The application provides a reinforcement learning training method and system for space-time interaction dynamic three-dimensional reconstruction, which belong to the technical field of three-dimensional reconstruction, wherein the method comprises the steps of constructing a space-time interaction dynamic three-dimensional reconstruction model according to a video sequence of a dynamic environment, endowing each three-dimensional point with attribute parameters which dynamically change along with time, and constructing dynamic three-dimensional scene representation by adopting the attribute parameters which dynamically change along with time; the method comprises the steps of packaging and deploying a space-time interaction dynamic three-dimensional reconstruction model after space-time consistency optimization as a dynamic environment simulator, converting a behavior strategy into executable actions according to observation representation of an agent, enabling the agent to be implemented in a real environment according to the executable actions, and generating a new space-time interaction dynamic three-dimensional reconstruction model based on the updated environment through the space-time interaction dynamic three-dimensional reconstruction model after space-time consistency optimization. The method solves the problems of high training cost, high risk and low sample efficiency of the traditional reinforcement learning in the real environment.

Inventors

TONG GUOFENG
CHEN HAO
MENG XIANGZHENG
SUN JIARU

Assignees

东北大学

Dates

Publication Date: 20260508
Application Date: 20260121

Claims (10)

1. A reinforcement learning training method of space-time interaction dynamic three-dimensional reconstruction is characterized by comprising the following steps: s1, collecting video sequences of dynamic environments aligned in time, wherein the video sequences of the dynamic environments comprise actions of an agent; S2, constructing a space-time interaction dynamic three-dimensional reconstruction model according to a video sequence of a dynamic environment, wherein the space-time interaction dynamic three-dimensional reconstruction model endows each three-dimensional point with attribute parameters which dynamically change along with time, constructing dynamic three-dimensional scene representation by adopting the attribute parameters which dynamically change along with time, taking the dynamic three-dimensional scene representation at the current moment and the actions of an intelligent agent as inputs of a state transfer function, and obtaining the dynamic three-dimensional scene representation at the next moment, wherein the state transfer function is a neuron network, and the number of neurons in the neuron network is optimized according to the difference between a minimized prediction result and a real observation result; s3, carrying out space-time consistency optimization on the space-time interaction dynamic three-dimensional reconstruction model; s4, packaging and deploying the space-time interaction dynamic three-dimensional reconstruction model with optimized space-time consistency as a dynamic environment simulator; S5, taking the compact scene characteristics as the observation representation of the intelligent agent at the current moment in the dynamic environment simulator; S6, according to the observation representation of the intelligent agent, adopting a reinforcement learning strategy network architecture to obtain a behavior strategy executed by the intelligent agent in a dynamic environment; step S7, converting the behavior strategy into executable actions, and enabling the intelligent agent to be implemented in a real environment according to the executable actions; Step S8, after the intelligent agent is implemented in the real environment, the environment state change is caused, the step S5 is returned, and after the dynamic environment simulator receives the action of the intelligent agent in the real environment, a new space-time interaction dynamic three-dimensional reconstruction model is generated based on the space-time interaction dynamic three-dimensional reconstruction model after the updated environment is optimized through space-time consistency, and the observation representation of the intelligent agent at the next moment is obtained; Outputting a dynamic evolving three-dimensional scene representation from a dynamic environment simulator, and extracting layering features with definite geometric and semantic connotations from the dynamic evolving three-dimensional scene representation; The PCA method is combined with an encoder and a decoder, the dimension of the layering characteristic is reduced, and the average value output by the encoder is used as a compact scene characteristic; the PCA method is combined with an encoder and a decoder to reduce the dimension of the layering characteristic, and the average value output by the encoder is used as the compact scene characteristic, and the method comprises the following steps: projecting the layering characteristics to a task-related low-dimensional subspace, and calculating a covariance matrix of the layering characteristics; performing eigenvalue decomposition on the covariance matrix to obtain eigenvalues and corresponding eigenvectors; Calculating a variance contribution rate according to the characteristic value; selecting the smallest principal component so that the cumulative variance contribution rate is greater than or equal to a variance threshold; constructing a projection matrix according to the feature vector and the minimum principal component, and obtaining the feature after dimension reduction; Mapping the feature after dimension reduction into posterior distribution parameters of a potential space by adopting an encoder; Sampling from posterior distribution parameters by a re-parameterization method to obtain potential vectors; Adopting a decoder to reconstruct the feature of the input encoder after dimension reduction according to the potential vector; The average of the encoder output is taken as a compact scene feature.
2. The reinforcement learning training method of a space-time interactive dynamic three-dimensional reconstruction of claim 1, wherein said state transfer function comprises: the point feature lifting layer adopts a full-connection layer structure and is used for mapping original attributes of each three-dimensional point in the dynamic three-dimensional scene representation at the t-th moment to a high-dimensional feature space; The local geometric coding layer adopts a graph convolution layer or a lightweight PointNet ++ module and is used for extracting the local geometric structure characteristics of each three-dimensional point according to the high-dimensional characteristic space; A global context aggregation layer, which adopts an encoder structure comprising 2 convertors blocks and is used for modeling the dependency relationship between three-dimensional points according to the local geometric structure characteristics; the action condition fusion layer adopts a cross attention mechanism and is used for fusing actions of the intelligent agent into point characteristics according to the dependency relationship between three-dimensional points and the actions of the intelligent agent to obtain fusion characteristics; the space-time evolution prediction layer adopts a long-term and short-term memory neural network and is used for predicting a motion trend based on a historical state sequence according to the fusion characteristics; The attribute solution wharf layer comprises a plurality of parallel lightweight multi-layer perceptrons which are respectively used for predicting increment attribute parameters of the ith three-dimensional point according to the motion trend; And the increment application layer applies the increment attribute parameters of the ith three-dimensional point to the three-dimensional point state at the t moment by adopting non-parameterized mathematical operation to obtain the dynamic three-dimensional scene representation at the t+1th moment.
3. The method for reinforcement learning training of a space-time interactive dynamic three-dimensional reconstruction of claim 1, wherein, The construction of the space-time interaction dynamic three-dimensional reconstruction model according to the video sequence of the dynamic environment comprises the following steps: recovering an initial sparse three-dimensional point cloud from a video sequence of a dynamic environment by adopting a motion recovery structure method to obtain a static three-dimensional scene at an initial moment; Attribute parameters which dynamically change along with time are endowed to each three-dimensional point in the static three-dimensional scene at the initial moment, and the static three-dimensional scene representation is expanded into a dynamic three-dimensional scene representation; based on the state transfer function, the dynamic three-dimensional scene representation at the current time, and the actions of the agent, the dynamic three-dimensional scene representation at the next time is predicted.
4. The reinforcement learning training method of space-time interactive dynamic three-dimensional reconstruction according to claim 1, wherein the optimizing of the number of neurons in the neuron network according to minimizing the difference between the predicted result and the real observed result comprises: rendering the space-time interaction dynamic three-dimensional reconstruction model into a predicted image and a corresponding predicted depth map under any view angle by adopting a differential renderer, and taking the predicted image and the corresponding predicted depth map under any view angle as a predicted result; And optimizing the time-space interaction dynamic three-dimensional reconstruction model according to the error between the predicted image at any view angle and the image at the moment corresponding to the video sequence of the dynamic environment in the step S1 and the error between the predicted depth image and the depth image at the moment corresponding to the video sequence of the dynamic environment in the step S1, so as to obtain the time-space interaction dynamic three-dimensional reconstruction model after modeling optimization, wherein the image at the moment corresponding to the video sequence of the dynamic environment in the step S1 and the depth image at the moment corresponding to the moment are real observation results.
5. The reinforcement learning training method of space-time interactive dynamic three-dimensional reconstruction according to claim 1, wherein the space-time consistency optimization of the space-time interactive dynamic three-dimensional reconstruction model comprises: Applying a smoothness constraint to the motion of the three-dimensional points of adjacent time frames, wherein the smoothness constraint comprises a motion track smoothness constraint, a speed smoothness constraint, a rotation and scale smoothness constraint, an appearance attribute time sequence consistency and a total time consistency loss, the motion track smoothness constraint is a second-order difference of the positions of the three-dimensional points between adjacent frames, the speed smoothness constraint is a second-order difference of the speeds of the three-dimensional points between adjacent frames, the rotation and scale smoothness constraint is smoothness of rotation change and scaling continuity of a three-dimensional ellipsoid through lie algebra calculation, the appearance attribute time sequence consistency is the consistency of time evolution and opacity of spherical harmonic coefficients, and the total time consistency loss is a weighted summation among the motion track smoothness constraint, the speed smoothness constraint, the rotation and scale smoothness constraint and the appearance attribute time sequence consistency; Geometric constraints are applied to the three-dimensional points based on the multi-view vision principle, wherein the geometric constraints comprise a reprojection photometric error, a depth reprojection error, a surface normal consistency constraint, a dynamic scene flow consistency constraint and a neighborhood structure retention constraint.
6. The reinforcement learning training method of space-time interactive dynamic three-dimensional reconstruction according to claim 1, wherein the step of using the compact scene feature as the observation representation of the agent at the current time in the dynamic environment simulator comprises the following steps: carrying out standardization processing on the compact scene characteristics; The normalized features are organized into normalized observation vectors.
7. The reinforcement learning training method of space-time interactive dynamic three-dimensional reconstruction according to claim 6, wherein the normalizing the compact scene features comprises: normalizing each continuous dimension of the compact scene features to obtain normalized features; Performing One-hot coding on the standardized class type features corresponding to the existing class type features to obtain coding class features; and splicing the standardized characteristics with the coding category characteristics to obtain the observation representation of the intelligent agent.
8. The reinforcement learning training method of space-time interaction dynamic three-dimensional reconstruction according to claim 1, wherein the obtaining the behavior strategy executed by the agent in the dynamic environment by adopting the reinforcement learning strategy network architecture according to the observation representation of the agent comprises: The input of the strategy network is the observation representation of the intelligent agents, and for the discrete action space, the output layer of the strategy network adopts a Softmax activation function to generate the probability distribution of the actions of each intelligent agent; The input of the value network is the observation representation of an agent, the output of the value network is a scalar state value function, and the expected accumulated discount return of the state under the current strategy is represented; and executing dynamic circulation by adopting the strategy network and the value network, and optimizing the strategy network and the value network by adopting a near-end strategy optimization algorithm.
9. A reinforcement learning training system for space-time interactive dynamic three-dimensional reconstruction, comprising: The data acquisition module is used for acquiring video sequences of dynamic environments aligned in time, wherein the video sequences of the dynamic environments comprise actions of an agent; The three-dimensional reconstruction module is used for constructing a space-time interaction dynamic three-dimensional reconstruction model according to a video sequence of a dynamic environment, endowing each three-dimensional point with attribute parameters which dynamically change along with time, and predicting the state of the three-dimensional point at the next moment according to the state of the three-dimensional point at the current moment and the action of the intelligent body; The space-time optimization module is used for performing space-time consistency optimization on the space-time interaction dynamic three-dimensional reconstruction model; the dynamic simulator module is used for packaging and deploying the space-time interaction dynamic three-dimensional reconstruction model after space-time consistency optimization into a dynamic environment simulator; the intelligent agent observation module is used for taking the compact scene characteristic as the observation representation of the intelligent agent at the current moment in the dynamic environment simulator; the reinforcement learning module is used for obtaining a behavior strategy executed by the intelligent agent in a dynamic environment by adopting a reinforcement learning strategy network architecture according to the observation representation of the intelligent agent; the action execution module is used for converting the behavior strategy into executable actions, and enabling the intelligent agent to be implemented in a real environment according to the executable actions; The dynamic interaction module is used for inducing environmental state change after the intelligent body is implemented in the real environment, returning to the intelligent body observation module, generating a new space-time interaction dynamic three-dimensional reconstruction model based on the space-time interaction dynamic three-dimensional reconstruction model after the updated environment passes through space-time consistency optimization after the dynamic environment simulator receives the action of the intelligent body in the real environment, and obtaining the observation representation of the intelligent body at the next moment; Outputting a dynamic evolving three-dimensional scene representation from a dynamic environment simulator, and extracting layering features with definite geometric and semantic connotations from the dynamic evolving three-dimensional scene representation; The PCA method is combined with an encoder and a decoder, the dimension of the layering characteristic is reduced, and the average value output by the encoder is used as a compact scene characteristic; the PCA method is combined with an encoder and a decoder to reduce the dimension of the layering characteristic, and the average value output by the encoder is used as the compact scene characteristic, and the method comprises the following steps: projecting the layering characteristics to a task-related low-dimensional subspace, and calculating a covariance matrix of the layering characteristics; performing eigenvalue decomposition on the covariance matrix to obtain eigenvalues and corresponding eigenvectors; Calculating a variance contribution rate according to the characteristic value; selecting the smallest principal component so that the cumulative variance contribution rate is greater than or equal to a variance threshold; constructing a projection matrix according to the feature vector and the minimum principal component, and obtaining the feature after dimension reduction; Mapping the feature after dimension reduction into posterior distribution parameters of a potential space by adopting an encoder; Sampling from posterior distribution parameters by a re-parameterization method to obtain potential vectors; Adopting a decoder to reconstruct the feature of the input encoder after dimension reduction according to the potential vector; The average of the encoder output is taken as a compact scene feature.
10. A computer program product comprising a computer program or instructions which, when executed by a processor, implements a reinforcement learning training method of spatio-temporal interactive dynamic three-dimensional reconstruction as claimed in any one of claims 1 to 8.

Description

Reinforced learning training method and system for space-time interaction dynamic three-dimensional reconstruction Technical Field The invention belongs to the technical field of three-dimensional reconstruction, and particularly relates to a reinforcement learning training method and system for space-time interaction dynamic three-dimensional reconstruction. Background Reinforcement learning is one of the core fields of artificial intelligence, learns an optimal decision strategy through continuous interaction of an agent and an environment, and has achieved remarkable success in complex sequence decision tasks such as robot control, game AI, automatic driving and the like. However, in vision-driven reinforcement learning, the quality of the representation of the environment is critical, and the training process of conventional reinforcement learning methods is often severely dependent on the interactive or high-fidelity simulation environment of the real physical environment. Training in a real environment is not only high in cost and low in efficiency, but also can bring security risks due to random behaviors in the exploration process, and constructing a simulation environment which can accurately reflect physical laws and contains rich scene changes faces the problems of complex modeling, high consumption of computing resources and difficult avoidance of 'authenticity gap', and directly influences the efficiency and final performance of strategy learning. In order to overcome the limitation, the three-dimensional reconstruction technology is introduced into the reinforcement learning training process to become an emerging research direction, and the core idea is to quickly construct an interactable three-dimensional scene model by utilizing images or sensor data acquired from the real world, so as to provide a real and reusable training field for reinforcement learning intelligent bodies. The paper Learning to NAVIGATE IN Complex (arXiv, 2016) constructs a static simulation environment based on the three-dimensional reconstruction results, and the work recovers a dense three-dimensional point cloud and grid model of a scene from a real city street image sequence by a motion recovery structure (SfM, structure from Motion) and Multi-View Stereo (MVS) technology, and generates a grid scene with texture map on the basis. However, the environment reconstructed by this method is static and solidified in nature. The reconstruction result is a one-time, immutable snapshot of the scene at the time of data acquisition. During the training process, the environmental elements cannot be dynamically adjusted and edited, nor can the objects in the scene be moved or replaced to create a new layout. The constant environmental characteristic severely limits the diversity of training scenes and the generalization of intelligent agents. Resulting in an agent that is very prone to overfitting to the fixed visual appearance and spatial structure of the scene after training in this particular environment, it is difficult to adapt to the complexity of lighting, occlusions, and layout as the strategy moves to the real world or other slightly changing scene, thus exhibiting dramatic performance degradation. Paper Reinforcement LEARNING WITH Generalizable Gaussian Splatting (arXiv, 2024) proposes an innovative framework named Gaussian Splatting Reinforcement Learning (GSRL), the first attempt to use generalizable three-dimensional models as a core environment representation for reinforcement learning. The method utilizes a pretrained generalizable three-dimensional model, and can rapidly infer the explicit three-dimensional representation of the scene from a small number of images acquired by the front camera. However, the GSRL method can only build and rely on a static, transient three-dimensional snapshot of the scene, and cannot model changes in elements of the scene over time. Thus, the static environmental snapshots provided by GSRL do not provide the necessary spatio-temporal continuity information, which results in an agent being unable to learn decision strategies based on motion cues. Meanwhile, since the environment is static, in order to obtain training data for dynamic tasks, it is necessary to rely on a static snapshot dataset that covers all possible object motion trajectories. The data paradigm of such a static snapshot is inefficient and limited, forcing the model to memorize a series of discrete environmental states rather than learn a continuous law of motion, resulting in significant shortfalls in generalization ability of the agent in the face of new dynamic scenarios. Disclosure of Invention Aiming at the defects of the prior art, the application provides a reinforcement learning training method and system for space-time interaction dynamic three-dimensional reconstruction, and the problem of characterization failure of static environment representation in dynamic tasks is fundamentally solved by providing a