CN-121985143-A - Dynamic point cloud compression method and system based on implicit modeling and detail reconstruction

CN121985143ACN 121985143 ACN121985143 ACN 121985143ACN-121985143-A

Abstract

The invention discloses a dynamic point cloud compression method and a system based on implicit modeling and detail reconstruction, which belong to the technical field of computer vision and three-dimensional graphics and comprise the following steps of acquiring a dynamic point cloud sequence; the method comprises the steps of performing multiscale downsampling, inputting the coordinate with the lowest resolution and a historical reference frame into an FMT module, outputting context characteristics, encoding the fused characteristics through a context encoder, processing a code stream by adopting a differentiated compression strategy, S6, decoding and recovering and constructing a high-level potential representation, inputting the decoded and recovered coordinate and the historical reference frame into the FMT module, reconstructing the context, inputting a context decoder to generate characteristics with time sequences aligned, inputting a CTR module to obtain optimized characteristics, generating potential point clouds, and generating full-resolution reconstruction point clouds through multiscale upsampling. The method can solve the problems of inaccurate motion modeling, high code rate overhead and high loss of high-frequency geometric details in the reconstruction recovery process caused by the fact that the existing dynamic point cloud compression depends on an explicit motion vector.

Inventors

Man Hangyu
DENG XUAN
FAN XIAOPENG
ZHAO DEBIN

Assignees

哈尔滨工业大学

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (10)

1. A dynamic point cloud compression method based on implicit modeling and detail reconstruction is characterized by comprising the following steps: S1, acquiring a dynamic point cloud sequence Each frame at time t , Is a three-dimensional coordinate of which the position is a three-dimensional coordinate, For the voxel to occupy the feature, N is the number of point clouds, and d is the feature dimension; S2, for the current frame point cloud Downsampling at multiple scales to obtain potential representations of k scales: wherein , As a potential representation of the kth scale, Is the coordinate of the kth scale, For the features of the kth scale, k is not less than 3; S3, the coordinate with the lowest resolution of the current frame is obtained With reconstructed historical reference frames stored in a reference buffer Inputting the FMT module, implicitly aligning the characteristics of the historical reference frame to the current frame, and outputting the context characteristics with motion perception capability ; S4, the feature with the lowest resolution of the current frame is obtained And (3) with Splicing and then encoding by a context encoder to generate a potential representation of a higher layer ; S5, processing the code stream by adopting a differentiated compression strategy, and outputting the compressed code stream; S6, receiving the compressed code stream, and decoding and recovering 、、 Building a high-level potential representation ; S7, recovering the decoding And reconstructed historical reference frames stored in a reference buffer Inputting FMT module, reconstructing context representation = ; S8, will In (a) and (b) 、 In (a) and (b) Input context decoder fusion to generate time-aligned features ; S9, aligning the time sequence Inputting CTR module, self-adaptively aggregating local information of historical reference frame to obtain optimized characteristics Will be And (3) with Combining to generate a potential point cloud ; S10, Generating a full resolution reconstruction point cloud via multiscale upsampling 。
2. The method for compressing dynamic point cloud based on implicit modeling and detail reconstruction of claim 1, wherein in S3 and S7, the implementation steps of the FMT module comprise: s301, in the lowest resolution coordinate space of the current frame In constructing the current frame With historical reference frames A fixed adjacency matrix between; S302, extracting relative position information between adjacent points and corresponding reference frame characteristics based on the adjacent relation, and aggregating the relative position information and the corresponding reference frame characteristics; S303, inputting the aggregated characteristics into a Softmax function to generate a soft mask, wherein the value range is [0,1]; S304, soft mask is used for neighborhood feature in reference frame Weighted modulation is performed to obtain a weighted context feature, which is then associated with the anchor coordinates of the current frame Splicing is carried out in the channel dimension, the modulated features are remapped through a lightweight multi-layer perceptron, and finally the motion alignment feature representation of the current frame is output.
3. The method for dynamic point cloud compression based on implicit modeling and detail reconstruction of claim 2, wherein in S301, the adjacency matrix comprises a KNN adjacency matrix, a sphere query adjacency matrix, a voxel neighbor matrix and a learning graph structure matrix.
4. The method for dynamic point cloud compression based on implicit modeling and detail reconstruction of claim 1, wherein in S9, the CTR module comprises the following implementation steps: s901, query feature comes from pair Ji Dianyun at time t Key features and value features are from a historical reference point cloud ; S902, the query feature, the key feature and the value feature are specifically expressed by the following formulas: ; ; ; Wherein i is the index of the ith point in the current point cloud, j is the index of the jth point cloud in the reference point cloud; query features for the ith point; key features for the j-th point; Value characteristics for the j-th point; Transforming the function for the query feature; Transforming a function for key features; transforming the function for the value characteristic; an ith point of rearrangement; Is the j-th point in the reference point cloud; S903, for each query point i, first determine a local neighborhood in its reference point cloud, and then calculate the attention output, with the following formula: ; In the formula, The method is a position coding function capable of learning and is used for describing the geometric relationship between the query point and the reference point; outputting a vector attention weight for each characteristic channel for a multi-layer perceptron; Indicates Softmax normalization operation on the neighborhood points, and indicates channel level multiplication to realize channel-by-channel modulation of the value characteristics; coordinates of an ith point in the point cloud refined in the kth stage; is the kth refinement stage; the coordinates of a j-th point of the reference point cloud in the k-th stage; a kth stage of sampling for the reference frame; an ith point of the point cloud; The ith point in the kth arranged point cloud is the ith point; a j-th point of a k-th stage of the reference point cloud; s904, fusing the aggregated features and the original features through residual connection and layer normalization to obtain refined feature representation.
5. The method for dynamic point cloud compression based on implicit modeling and detail reconstruction of claim 4, wherein in S902, the specific content is implemented by linear projection or lightweight multi-layer perceptron.
6. The method of dynamic point cloud compression based on implicit modeling and detail reconstruction of claim 4, wherein in S903, attention includes cross attention, scalar attention, linear attention, and lightweight attention.
7. The method for dynamic point cloud compression based on implicit modeling and detail reconstruction of claim 1, wherein in S5, the compression strategy comprises: Features (e.g. a character) Compressing through a conditional entropy model; Coordinates of Compressing by adopting a G-PCC coder based on octree; Coordinates of Compression is performed by an end-to-end learnable lossless codec.
8. The method for dynamic point cloud compression based on implicit modeling and detail reconstruction of claim 1, wherein in S6, Decoding is carried out through the conditional entropy model after compression, After compression, decoding is carried out by a G-PCC decoder, And decoding by a lossless decoder after compression.
9. A dynamic point cloud compression system based on implicit modeling and detail reconstruction, which adopts the dynamic point cloud compression method based on implicit modeling and detail reconstruction as claimed in any one of claims 1 to 8, and is characterized by comprising: The FMT module is used for receiving the lowest resolution coordinates and the historical reference frames, implicitly aligning the features of the historical reference frames and outputting the motion perception context features; The CTR module is used for adaptively aggregating the local information of the historical reference frames and outputting optimization features; The downsampling module is used for downsampling the current frame point cloud in multiple scales to generate multiple-scale potential representations; The up-sampling module is used for carrying out multi-scale up-sampling on the potential point cloud to generate a full-resolution reconstruction point cloud; A context encoder for concatenating the lowest resolution feature of the current frame with the motion-aware context feature, encoding to generate a higher layer potential representation; A context decoder for fusing the high-level potential features with the context representation to generate a timing alignment feature; and the reference buffer area is used for storing the reconstructed point cloud and providing historical reference frames for subsequent frames.
10. The dynamic point cloud compression system based on implicit modeling and detail reconstruction of claim 9, wherein the upsampling module and the downsampling module are formed by sparse convolution.

Description

Dynamic point cloud compression method and system based on implicit modeling and detail reconstruction Technical Field The invention relates to the technical field of computer vision and three-dimensional graphics, in particular to a dynamic point cloud compression method and system based on implicit modeling and detail reconstruction. Background The dynamic point cloud (Dynamic Point Cloud, DPC) is time series data composed of a series of point cloud frames which are sparsely and randomly distributed in a three-dimensional space, and has been shown to have a wide application prospect in the fields of immersive media, somatosensory capture, autopilot, AR/VR and the like as an emerging three-dimensional content representation form. However, the huge volume of data constitutes a serious challenge for storage and real-time transmission, and efficient compression technology is a key premise for practical deployment. The current dynamic point cloud compression method mainly eliminates time redundancy through inter-frame prediction, and is characterized in that a current frame is predicted by utilizing reference frame information, and only prediction residues are encoded. The main stream technology can be divided into two types of standardized schemes and deep learning driving schemes, but has obvious limitations, and concretely comprises the following steps that 1, the standardized method relies on explicit motion vector coding, and G-PCC and V-PCC established by MPEG are compression standards widely adopted at present. The G-PCC performs predictive coding on geometry based on octree structure, and the V-PCC performs compression by 2D projection in combination with a video encoder (e.g., HEVC). Both require explicit estimation and coding of Motion Vectors (MVs) to indicate the point correspondence of the reference frame to the current frame. 2. Based on the "motion+residual" paradigm of the neural compression method, in recent years, dynamic Point Cloud Compression (DPCC) methods (e.g., D-DPCC, adaDPCC, etc.) based on deep learning introduce variable self-encoder (VAE) compression spatial redundancy, but still rely heavily on explicit Motion modeling in the time dimension. Typically, the motion vectors are estimated in the potential feature space using KNN or 3DAWI (3D Adaptively Weighted Interpolation) algorithms, and the prediction residuals are entropy coded. The conventional dynamic point cloud compression technology has fundamental bottlenecks in terms of a motion modeling mode and detail fidelity capability, namely, on one hand, an explicit motion vector is difficult to adapt to irregular dynamics of point clouds, and on the other hand, even if implicit modeling is adopted, an effective cross-frame detail recovery mechanism is lacking to reconstruct a high-frequency geometric structure. Therefore, a new compression method is needed to adaptively fuse cross-frame local information at the decoding end to recover fine details without explicit motion vectors, so as to achieve unification of high efficiency and high quality. Disclosure of Invention The invention aims to provide a dynamic point cloud compression method and a system based on implicit modeling and detail reconstruction, which solve the problems of inaccurate motion modeling, high code rate overhead and high loss of high-frequency geometric details in the reconstruction process caused by the fact that the existing dynamic point cloud compression depends on an explicit motion vector. In order to achieve the above purpose, the invention provides a dynamic point cloud compression method based on implicit modeling and detail reconstruction, which comprises the following steps: S1, acquiring a dynamic point cloud sequence Each frame at time t,Is a three-dimensional coordinate of which the position is a three-dimensional coordinate,For the voxel to occupy the feature, N is the number of point clouds, and d is the feature dimension; S2, for the current frame point cloud Downsampling at multiple scales to obtain potential representations of k scales: wherein ,As a potential representation of the kth scale,Is the coordinate of the kth scale,For the features of the kth scale, k is not less than 3; S3, the coordinate with the lowest resolution of the current frame is obtained With reconstructed historical reference frames stored in a reference bufferInputting the FMT module, implicitly aligning the characteristics of the historical reference frame to the current frame, and outputting the context characteristics with motion perception capability; S4, the feature with the lowest resolution of the current frame is obtainedAnd (3) withSplicing and then encoding by a context encoder to generate a potential representation of a higher layer; S5, processing the code stream by adopting a differentiated compression strategy, and outputting the compressed code stream; S6, receiving the compressed code stream, and decoding and recovering 、、Building a high-level potential