CN-121999209-A - Three-dimensional point cloud semantic segmentation method based on frequency domain analysis
Abstract
The invention discloses a three-dimensional point cloud semantic segmentation method based on frequency domain analysis. According to the method, frequency domain analysis is introduced, point cloud characteristics are decomposed into low-frequency sub-bands, medium-frequency sub-bands and high-frequency sub-bands, frequency conversion and filtering are carried out on the point cloud characteristics, multi-frequency band characteristics are extracted, self-adaptive enhancement is achieved by means of a learning filter, and synchronous perception capability of global shape and local detail is improved. Aiming at the problem of singleness of the existing training network framework, the CNN+transformer fusion neural network framework is designed and adopted in the method, local geometric feature extraction and global semantic modeling are considered, segmentation precision is remarkably improved, and efficient interaction of multi-domain features is achieved. The decoder has higher point-to-point prediction precision and clearer boundary. And finally, the design of the loss function gives consideration to class equalization and boundary refinement, so that the model has better robustness and generalization capability under a complex point cloud scene.
Inventors
- ZHANG JUNQIANG
- WANG YUCHEN
- WANG LIHUI
- WANG XIANQUAN
- ZHANG KE
- WANG YUAN
- LIU YUAN
- ZHAO XIN
Assignees
- 陕西送变电工程有限公司
- 国网陕西省电力有限公司经济技术研究院
Dates
- Publication Date
- 20260508
- Application Date
- 20251107
Claims (11)
- 1. A three-dimensional point cloud semantic segmentation method based on frequency domain analysis is characterized by comprising the following steps: s1, acquiring three-dimensional point cloud data sets of different examples, and carrying out centering and normalization processing on the three-dimensional point cloud data sets to obtain a three-dimensional point cloud data mapping set after the processing is completed; S2, mapping the three-dimensional point cloud data mapping set into a voxel network, obtaining a voxel grid after mapping, and dividing the voxel grid into a plurality of local windows which are equal in size and uniformly arranged in a three-dimensional space; S3, carrying out Fourier transform on the local window to obtain a low-frequency part F low , an intermediate-frequency part F band and a high-frequency part F high , respectively carrying out filtering operation on the low-frequency part F low , the intermediate-frequency part F band and the high-frequency part F high , and recovering a frequency band enhancement signal of a spatial domain after the filtering operation is finished to obtain frequency band enhancement characteristics; S4, sending the frequency band enhancement features into a lightweight sparse three-dimensional convolutional neural network to obtain multi-scale frequency domain features ; S5, extracting local geometric characteristics of the three-dimensional point cloud data mapping set in the point cloud field range through a CNN branched neural network, and capturing neighborhood relation and local topological structure of the three-dimensional point cloud data mapping set ; S6, extracting the local geometric features through a transducer fusion module to obtain spatial domain features; S7, carrying out the multi-scale frequency domain feature Splicing with the spatial domain features, and realizing nonlinear fusion through a multi-layer perceptron: optionally introducing a cross-channel attention mechanism: Further enhancing the complementary relation between the frequency domain and the space domain to obtain the fused point cloud characteristics ; S8, integrating the point cloud characteristics The method comprises the steps of inputting the low-level detail characteristics into an up-sampling decoder, gradually recovering the spatial resolution by the up-sampling decoder, transmitting the low-level detail characteristics by using jump connection, and outputting the category probability of each point: 。
- 2. The method for three-dimensional point cloud semantic segmentation based on frequency domain analysis according to claim 1, wherein in step S1, three-dimensional point cloud data sets of different examples are obtained through a common platform, and the three-dimensional point cloud data sets comprise one of point cloud labels of object part levels and point cloud labels in indoor scenes.
- 3. The three-dimensional point cloud semantic segmentation method based on frequency domain analysis according to claim 1, wherein in step S2, the voxel grid is divided into local windows by the following method: ; Wherein, the Is a voxel network; Representing the number of points falling into the voxel grid coordinates (u, v, w) as normalization factor, I condition indicating whether the condition is a true binary indicator function, r is voxel resolution, floor (x) represents a maximum integer returned to not more than x, floor (y) represents a maximum integer returned to not more than y, floor (z) represents a maximum integer returned to not more than z; Representing the average eigenvalue of the C-th channel corresponding to p i ; representing three-dimensional coordinates of the i-th point; representing the actual space x-axis coordinates of the ith point in the three-dimensional point cloud data set; Representing the actual space y-axis coordinate of the ith point in the three-dimensional point cloud data set; representing the actual z-axis coordinates of the i-th point in the three-dimensional point cloud data set.
- 4. The three-dimensional point cloud semantic segmentation method based on frequency domain analysis according to claim 1, wherein in step S2, the process of creating the local windows with equal and uniform arrangement is as follows: ; the above formula defines a partial window to As a starting point, a voxel block with the size of MxMxM is contained in a three-dimensional grid coordinate system; Representing the start coordinates ji of the local window; representing the voxel coordinate index currently under consideration.
- 5. The three-dimensional point cloud semantic segmentation method based on frequency domain analysis according to claim 1, wherein in step S3, fourier transformation is performed on the local window by the following method: ; Wherein, the Representing the frequency spectrum obtained after three-dimensional Fourier transform of the three-dimensional point cloud data set and the voxel network; is a spectrum index; representing the spatial/temporal signals in three dimensions Sampling values at the location; X, Y, Z represent the number of sampling points in each dimension, for normalizing the index to a frequency fraction; an exponential basis function is represented for mapping the spatial domain samples to the frequency domain.
- 6. The three-dimensional point cloud semantic segmentation method based on frequency domain analysis according to claim 5, wherein components of the low frequency part F low , the intermediate frequency part F band and the high frequency part F high are extracted by a band-pass filter: ; Wherein, the , Representing the frequency spectrum obtained after three-dimensional Fourier transform of the three-dimensional point cloud data set and the voxel network; A filter function representing a kth frequency band; representing the filtered spectrum and retaining only the frequency components of the corresponding frequency band.
- 7. The method for three-dimensional point cloud semantic segmentation based on frequency domain analysis according to claim 6, wherein the filtered spectrum is obtained by The band enhancement signal restored to the spatial domain by the inverse Fourier transform is used to obtain the band enhancement characteristics, and the operation method is as follows: ; Wherein, the Representing a three-dimensional inverse fourier transform; representing the recovered spatial signal and including the enhanced characteristics of the particular frequency band.
- 8. The three-dimensional point cloud semantic segmentation method based on frequency domain analysis of claim 1, wherein in step S4, the lightweight sparse three-dimensional convolutional neural network proposes the multi-scale frequency domain features by the following method : ; Wherein each branch contains a 3 x 3 sparse convolution + BatchNorm + ReLU activation function, and the deep frequency domain representation is extracted by connecting downsampling with residual errors, and three frequency domain features are finally obtained: low frequency component Intermediate frequency component High frequency component ; A sparse convolutional neural network is represented.
- 9. The method for three-dimensional point cloud semantic segmentation based on frequency domain analysis according to claim 1, wherein in step S6, the spatial domain features are obtained by extracting features by the following method and calculating attention weights based on relative position codes ; Wherein, the Representing spatial domain features; Represents an attention weight; Value representing point j.
- 10. The three-dimensional point cloud semantic segmentation method based on frequency domain analysis according to claim 1, wherein in step S7, the multi-layer perceptron realizes nonlinear fusion: optionally introducing a cross-channel attention mechanism: further enhancing the complementary relation between the multi-scale frequency domain features and the spatial domain features to obtain fused point cloud features ; Wherein, the Representing global context enhanced spatial features; representing low-frequency components, including overall shape and contour information; Representing intermediate frequency components, including local structure, details; Representing high frequency components, including edges, fine grain features; 、 、 And The four types of features are stacked in the channel dimension, which is equivalent to the combination of the multi-mode features, the multi-layer perceptron is used for realizing nonlinear mapping, and the point features after the combination are obtained after learning 。
- 11. The three-dimensional point cloud semantic segmentation method based on frequency domain analysis according to claim 1, wherein in step S8, resolution is gradually restored through an up-sampling decoder, low-level detail features are transferred through jump connection, and class probability of each point is output: For the loss function, a weighted combination of cross entropy and Dice loss is used: ; The joint loss not only ensures the overall classification precision, but also strengthens the segmentation quality of the boundary region; Wherein, the Representing cross entropy loss; Representing a real label, if point i belongs to category C Otherwise ; Representing the predicted class probability, summing all points, classes, and averaging, for the formula of the Dice loss L Dice , The overlapping area representing the true label and the predicted probability, The smaller the value representing the total area of prediction and reality, the closer the prediction and reality are; And The weight is represented and the effects of both losses are balanced.
Description
Three-dimensional point cloud semantic segmentation method based on frequency domain analysis Technical Field The application belongs to the field of computer vision, and particularly relates to a three-dimensional point cloud semantic segmentation method based on frequency domain analysis. Background As a direct three-dimensional sampling of object appearance and scene structure, a three-dimensional point cloud (3D point cloud) has become a core data form in many fields such as automatic driving, indoor scene understanding, and robot perception. Unlike traditional images, point cloud data has the characteristics of sparsity, irregularity and disorder, namely, points do not form a fixed grid, and sampling density can change obviously with the view angle of a sensor, the surface attribute of an object and the distance. These characteristics present challenges to depth learning based three-dimensional semantic segmentation, including robustness to noise and missing points, computational cost of achieving efficient long Cheng Yuyi modeling in large scenes, and how to handle noise, non-uniform sampling, and large-scale scenes while maintaining high accuracy. Over the past decades, researchers have proposed two main technical routes to solve the point cloud semantic segmentation problem, one class of point domain operations derived from PointNet/PointNet ++, feature extraction through local aggregation and hierarchical sampling, and another class of voxel-based or graph convolution-based approaches to attempt to map sparse point clouds onto regular structures to take advantage of standard convolution methods. In recent years, a transducer and a self-attention mechanism are rapidly developed in the three-dimensional field, are widely applied to cross-region semantic association modeling due to the fact that the transducer and the self-attention mechanism are good at capturing long-range dependence naturally, and meanwhile, some improvements based on geometric convolution (such as KPConv) and efficient local operators are continuously pushing baseline performance improvement. Despite significant advances, current methods still suffer from significant shortcomings in the trade-off between processing scale blending (both to identify macrostructures and to segment micro-boundaries), stability in the face of noise and non-uniform sampling, and computation and memory overhead. Frequency domain analysis (Frequency Domain Analysis) provides a new idea to solve these problems. In the field of signal processing, the frequency domain method decomposes the spatial signal into different frequency components by fourier transformation, the low frequency part corresponds to a smooth overall structure, and the high frequency part corresponds to details or changes rapidly. The idea is applied to point cloud semantic segmentation, and can achieve (1) low-frequency reinforcement, namely, highlighting the overall shape of an object, improving the consistent segmentation effect of a large scene, (2) high-frequency reinforcement, namely, capturing detailed characteristics such as boundaries and angular points, and improving the segmentation quality of a small target and a complex surface, (3) frequency filtering and denoising, namely, improving the robustness of a model to noise and sparse sampling by inhibiting invalid high-frequency components. Based on the background, combining frequency domain analysis and deep learning, the method is hopeful to construct a high-performance point cloud segmentation model with global consistency and local detail. Disclosure of Invention The application provides a three-dimensional point cloud semantic segmentation method based on frequency domain analysis, which is characterized in that frequency transformation and filtering are carried out on point cloud, multi-frequency characteristics are extracted and fused with convolution and transform characteristics of a spatial domain, so that multi-scale and multi-frequency joint modeling is realized, and the segmentation precision and boundary quality are obviously improved while the calculation efficiency is ensured. In order to achieve the above purpose, the present application provides a three-dimensional point cloud semantic segmentation method based on frequency domain analysis, comprising the following steps: s1, acquiring three-dimensional point cloud data sets of different examples, and carrying out centering and normalization processing on the three-dimensional point cloud data sets to obtain a three-dimensional point cloud data mapping set after the processing is completed; S2, mapping the three-dimensional point cloud data mapping set into a voxel network, obtaining a voxel grid after mapping, and dividing the voxel grid into a plurality of local windows which are equal in size and uniformly arranged in a three-dimensional space; S3, carrying out Fourier transform on the local window to obtain a low-frequency part F low, an intermediate-frequen