CN-121999476-A - Point cloud data processing method and device, electronic equipment and storage medium

CN121999476ACN 121999476 ACN121999476 ACN 121999476ACN-121999476-A

Abstract

The embodiment of the application discloses a point cloud data processing method, a device, electronic equipment and a storage medium, wherein the method can acquire initial characteristics of point cloud data, adopts an ith layer of coding block to conduct characteristic extraction processing on output characteristics of an i-1 layer of coding block to acquire output characteristics of the ith layer of coding block, adopts a j layer of decoding block to conduct characteristic fusion processing on the output characteristics of a K-j+1 layer of coding block and the output characteristics of a j-1 layer of decoding block to acquire output characteristics of the j layer of decoding block, and conducts point cloud identification processing based on the output characteristics of a last layer of decoding block to acquire identification results of the point cloud data. According to the application, the geometrical dependence exists between the local and the whole shape of the point cloud data, so that the dependence is effectively captured through layer-by-layer refinement and fusion, and the features of different scales are fused in the decoding process, so that the effective transmission of the local and the global features is ensured. Therefore, the accuracy of the characteristic representation of the point cloud data can be improved.

Inventors

HE QINGDONG
PENG JINLONG
WANG YABIAO
WANG CHENGJIE

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260508
Application Date: 20241107

Claims (15)

1. The point cloud data processing method is characterized by comprising the following steps of: Acquiring point cloud data under various mask scales according to the initial point cloud data; acquiring embedded representations of a plurality of sub-point cloud data, wherein each sub-point cloud data in the plurality of sub-point cloud data comprises partial data of point cloud data of one mask scale in the plurality of mask scales; Performing feature extraction processing on output features of an i-1 layer coding block by adopting the i layer coding block to obtain the output features of the i layer coding block, wherein the output features of the i layer coding block comprise the features of the plurality of sub-point cloud data, the output features of the 1 layer coding block are obtained by performing feature extraction processing on embedded representations of the plurality of sub-point cloud data by the 1 layer coding block, and i is a positive integer which is more than 1 and not more than K; And carrying out feature fusion processing on the output features of the K-j+1 layer coding block and the output features of the j-1 layer decoding block by adopting a j-th layer decoding block to obtain fusion features, carrying out feature extraction processing on the fusion features to obtain the output features of the j-th layer decoding block, wherein the output features of the j-th layer decoding block comprise the features of the plurality of sub-point cloud data, the output features of the 1-th layer decoding block are obtained by carrying out feature extraction processing on the output features of the K-th layer coding block by the 1-th layer decoding block, j is a positive integer which is more than 1 and not more than K, and the output features of the last layer decoding block are the feature representation of the initial point cloud data.
2. The method for processing point cloud data according to claim 1, wherein the coding block includes a global feature branch and a local feature branch, the i-th layer coding block is adopted to perform feature extraction processing on an output feature of the i-1-th layer coding block to obtain the output feature of the i-th layer coding block, the output feature of the i-th layer coding block includes features of the plurality of sub-point cloud data, wherein the output feature of the 1-th layer coding block is obtained by performing feature extraction processing on embedded representations of the plurality of sub-point cloud data by the 1-th layer coding block, and the i is a positive integer greater than 1 and not greater than K, and the method includes: Adopting a global feature branch of an i-th layer coding block to conduct global feature extraction processing on output features of the i-1 th layer coding block to obtain output features of the global feature branch of the i-th layer coding block, wherein the global feature extraction processing is used for conducting overall processing on the features of the plurality of sub-point cloud data in the output features of the i-1 th layer coding block; Adopting a local feature branch of an i-th layer coding block to perform local feature extraction processing on the output feature of the i-1 th layer coding block to obtain the output feature of the local feature branch of the i-th layer coding block, wherein the local feature extraction processing is used for performing independent processing on the feature of each sub-point cloud data in the output feature of the i-1 th layer coding block; and fusing the output characteristics of the global characteristic branch of the ith layer of coding block with the output characteristics of the local characteristic branch of the ith layer of coding block to obtain the output characteristics of the ith layer of coding block.
3. The method of processing point cloud data according to claim 2, wherein the global feature branch includes a spatial mixing module and a channel mixing module, the global feature branch of the i-1 th layer coding block is adopted to perform global feature extraction processing on output features of the i-1 th layer coding block, so as to obtain output features of the global feature branch of the i-th layer coding block, including: The method comprises the steps that a spatial mixing module of an ith layer of coding block is adopted, global attention calculation is conducted on output characteristics of the ith-1 layer of coding block, and global attention characteristics of the ith layer of coding block are obtained; mixing the global attention characteristic of the i-th layer coding block with the output characteristic of the i-1-th layer coding block to obtain the intermediate characteristic of the i-th layer coding block; adopting a channel mixing module of an ith layer coding block to carry out channel mixing on the middle characteristics of the ith layer coding block to obtain channel integration characteristics of the ith layer coding block; And acquiring the output of the global feature branch of the ith layer of coding block, wherein the output of the global feature branch of the ith layer of coding block comprises the intermediate feature of the ith layer of coding block and the channel integration feature of the ith layer of coding block.
4. The method of processing point cloud data according to claim 3, wherein the spatial mixing module of the i-th layer coding block performs global attention calculation on the output feature of the i-1-th layer coding block to obtain the global attention feature of the i-th layer coding block, and the method comprises: performing feature enhancement processing on the output features of the i-1 layer coding block to obtain position relation features; Determining multi-head attention of the position relation feature, wherein the multi-head attention comprises a gating vector, a position vector and a key value vector; Generating time-varying attenuation data of the position relation features; Performing global attention calculation based on the key value vector of the position relation feature by adopting the time-varying attenuation data to obtain a global attention result; And obtaining the global attention characteristic of the ith layer of coding block according to the global attention result, the gating vector of the position relation characteristic and the position vector.
5. The method of point cloud data processing as claimed in claim 4, wherein said generating time-varying attenuation data of said positional relationship features comprises: generating a position relation parameter according to the output characteristics of the i-1 layer coding block; And calculating time-varying attenuation data according to the position relation characteristic and the position relation parameter.
6. The method of processing point cloud data according to claim 3, wherein the channel mixing module for the ith layer of coding block performs channel mixing on the intermediate feature of the ith layer of coding block to obtain the channel integration feature of the ith layer of coding block, and the method comprises: performing bidirectional secondary expansion on the middle characteristic of the ith layer of coding block to obtain a channel relation characteristic; determining multi-head attention of the channel relation feature, wherein the multi-head attention comprises a gating vector, a position vector and a key value vector; And obtaining the channel integration characteristic of the ith layer coding block according to the multi-head attention of the channel relation characteristic.
7. The point cloud data processing method of claim 6, wherein the channel relation feature is represented as a matrix having a plurality of channels, the obtaining the channel integration feature of the i-th layer coding block according to the multi-head attention of the channel relation feature includes: for each channel of the channel relation feature, weighting the channel through an activation function to obtain a plurality of weighted channels; And fusing the weighted channels to obtain the channel integration characteristic of the ith layer coding block.
8. The method for processing point cloud data according to claim 2, wherein the performing local feature extraction processing on the output feature of the i-1 layer coding block to obtain the output feature of the local feature branch of the i layer coding block includes: For the characteristic of each piece of sub-point cloud data in the output characteristic of the i-1 layer coding block, constructing an initial point cloud image of a target point in each piece of sub-point cloud data, wherein the initial point cloud image comprises a vertex and an edge, the vertex characterizes the target point and a neighbor point of the target point in each piece of sub-point cloud data, and the target point is any point in each piece of sub-point cloud data; And updating the characteristics of the target points in the cloud data of each sub point based on the initial point cloud image to obtain the output characteristics of the local characteristic branches of the ith layer coding block.
9. The method of processing point cloud data according to claim 8, wherein updating the feature of the target point in each sub-point cloud data based on the initial point cloud map to obtain the output feature of the local feature branch of the i-th layer coding block includes: Generating edge characteristics of edges between the vertexes of the target point and the neighbor points according to the relative positions between the vertexes of the target point and the neighbor points in the initial point cloud chart; updating the vertex characteristics of the target point by adopting all the edge characteristics of the vertices of the target point; and obtaining the output characteristics of the local characteristic branches of the ith layer coding block, wherein the output characteristics of the local characteristic branches of the ith layer coding block comprise the vertex characteristics of the target point.
10. The point cloud data processing method of claim 9, wherein said generating edge features of edges between the vertices of the target point and the vertices of the neighboring point based on relative positions between the vertices of the target point and the vertices of the neighboring point in the initial point cloud image comprises: Determining a coordinate offset of a vertex of the target point; Adjusting the vertexes of the target points according to the coordinate offset to obtain vertexes of the aligned target points; And generating edge characteristics of edges between the vertexes of the target point and the neighbor point according to the relative positions between the vertexes of the aligned target point and the neighbor point.
11. The method for processing point cloud data according to claim 1, wherein the obtaining point cloud data under a plurality of mask scales from the initial point cloud data includes: Acquiring initial point cloud data; and carrying out multi-scale masking processing on the initial point cloud data to obtain point cloud data under a plurality of different masking scales.
12. The method of claim 11, wherein the plurality of point cloud data with different mask scales includes a mask point cloud for each downsampling iteration, and the performing multi-scale masking on the initial point cloud data to obtain the plurality of point cloud data with different mask scales includes: Performing multiple downsampling iterations on the initial point cloud data to obtain sampling point clouds of each downsampling iteration; Masking the sampling point cloud of the last downsampling iteration to obtain the masking point cloud of the last downsampling iteration; and carrying out back projection processing on the sampling point cloud of each downsampling iteration based on the mask point cloud of the last downsampling iteration to obtain the mask point cloud of each downsampling iteration.
13. A point cloud data processing apparatus, comprising: the acquisition unit is used for acquiring point cloud data under various mask scales according to the initial point cloud data; An embedding unit, configured to obtain an embedded representation of a plurality of sub-point cloud data, where each of the plurality of sub-point cloud data includes partial data of point cloud data of one mask scale of the plurality of mask scales; The coding unit is used for carrying out feature extraction processing on the output features of the i-1 th layer coding block by adopting the i-th layer coding block to obtain the output features of the i-th layer coding block, wherein the output features of the i-th layer coding block comprise the features of the plurality of sub-point cloud data, the output features of the 1 st layer coding block are obtained by carrying out feature extraction processing on the embedded representations of the plurality of sub-point cloud data by the 1 st layer coding block, and i is a positive integer which is more than 1 and not more than K; The decoding unit is used for carrying out feature fusion processing on the output features of the K-j+1 layer encoding block and the output features of the j-1 layer decoding block by adopting the j layer decoding block to obtain fusion features, carrying out feature extraction processing on the fusion features to obtain the output features of the j layer decoding block, wherein the output features of the j layer decoding block comprise the features of the plurality of sub-point cloud data, the output features of the 1 layer decoding block are obtained by carrying out feature extraction processing on the output features of the K layer encoding block by the 1 layer decoding block, j is a positive integer which is more than 1 and not more than K, and the output features of the last layer decoding block are the feature representation of the initial point cloud data.
14. An electronic device, comprising a processor and a memory, wherein the memory stores a plurality of instructions, and the processor loads the instructions from the memory to perform the steps in the point cloud data processing method according to any one of claims 1 to 12.
15. A computer readable storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor for executing the steps of the point cloud data processing method according to any of claims 1 to 12.

Description

Point cloud data processing method and device, electronic equipment and storage medium Technical Field The present application relates to the field of computers, and in particular, to a method and apparatus for processing point cloud data, an electronic device, and a storage medium. Background A point cloud is a data form for describing objects in three-dimensional space, typically for describing objects or scenes in three-dimensional space, consisting of a large number of points distributed on the surface or in space of the object, each point containing position information. The point cloud data is typically generated by a lidar or 3D scanning device. Point cloud identification refers to performing tasks such as classification, segmentation, detection and the like on point cloud according to point cloud characteristics. Because the point cloud data has no fixed structure, unlike regular pixel arrangement of images, the feature learning from the point cloud is complex, the traditional feature extraction method is difficult to capture details when processing the point cloud data, and the data can be analyzed from a single angle, so that the method is not flexible enough, and especially has poor effect when facing complex three-dimensional structures. Therefore, the characteristic representation accuracy of the point cloud data obtained by adopting the current point cloud data processing method is low. Disclosure of Invention The embodiment of the application provides a point cloud data processing method, a device, electronic equipment and a storage medium, and extracted point cloud features can grasp local details and characterize an overall structure, so that the method and the device are excellent in the aspects of classification, segmentation and detection in 3D point cloud recognition tasks, and therefore the accuracy of feature representation of the point cloud data can be improved. The embodiment of the application provides a point cloud data processing method, which comprises the following steps: Acquiring point cloud data under various mask scales according to the initial point cloud data; Acquiring embedded representations of a plurality of sub-point cloud data, wherein each sub-point cloud data in the plurality of sub-point cloud data comprises partial data of point cloud data of one mask scale in a plurality of mask scales; Performing feature extraction processing on the output features of the i-1 layer coding block by adopting the i layer coding block to obtain the output features of the i layer coding block, wherein the output features of the i layer coding block comprise the features of a plurality of sub-point cloud data, the output features of the 1 layer coding block are obtained by performing feature extraction processing on embedded representations of the plurality of sub-point cloud data by the 1 layer coding block, and i is a positive integer which is more than 1 and not more than K; And carrying out feature fusion processing on the output features of the K-j+1 layer coding block and the output features of the j-1 layer decoding block by adopting the j-1 layer decoding block to obtain fusion features, carrying out feature extraction processing on the fusion features to obtain the output features of the j layer decoding block, wherein the output features of the j layer decoding block comprise the features of a plurality of sub-point cloud data, the output features of the 1 layer decoding block are obtained by carrying out feature extraction processing on the output features of the K layer coding block by the 1 layer decoding block, j is a positive integer which is more than 1 and not more than K, and the output features of the last layer decoding block are represented by the features of the initial point cloud data. The embodiment of the application also provides a point cloud data processing device, which comprises: the acquisition unit is used for acquiring point cloud data under various mask scales according to the initial point cloud data; the embedding unit is used for obtaining embedded representation of a plurality of sub-point cloud data, wherein each sub-point cloud data in the plurality of sub-point cloud data comprises partial data of point cloud data of one mask scale in a plurality of mask scales; The coding unit is used for carrying out feature extraction processing on the output features of the i-1 layer coding block by adopting the i layer coding block to obtain the output features of the i layer coding block, wherein the output features of the i layer coding block comprise the features of a plurality of sub-point cloud data, the output features of the 1 layer coding block are obtained by carrying out feature extraction processing on embedded representations of the plurality of sub-point cloud data by the 1 layer coding block, and i is a positive integer which is more than 1 and not more than K; The decoding unit is used for carrying out feature fusion processing on the ou