CN-120726326-B - Indoor point cloud semantic segmentation method for enhancing geometric structural features of object

CN120726326BCN 120726326 BCN120726326 BCN 120726326BCN-120726326-B

Abstract

The invention discloses an indoor point cloud semantic segmentation method for enhancing geometric structural features of an object, and belongs to the technical field of computer vision. Aiming at the problems of insufficient geometrical structural characteristics of point cloud parts and insufficient sensitivity to the local characteristics of the point cloud semantic segmentation method, a point vector convolution module PVConv based on point vector characteristics is provided, a point vector convolution network PVCNN is formed by combining three layers PVConv, the local characteristics of the point cloud are extracted, a local spherical neighborhood is determined by convolution of each layer in the point vector convolution network according to a given radius, the central point and the neighborhood characteristics are extracted in the neighborhood, a window slice attention module learns the point cloud long-distance characteristics through a slice window attention mechanism WSA, self-attention calculation is carried out on three planes of a cube window obtained through point cloud space division, the attention field of the slice window attention mechanism is enlarged while the network calculation efficiency is improved, and the extraction capacities of the local and global characteristics of the network are balanced.

Inventors

YANG XIAOWEN
JIAO SHICHAO
REN DEMIN
HAN XIE
HAN HUIYAN
ZHANG YUAN
XIONG FENGGUANG
PANG MIN
JIA CAIQIN
ZHAO RONG

Assignees

中北大学

Dates

Publication Date: 20260512
Application Date: 20250619

Claims (6)

1. An indoor point cloud semantic segmentation method for enhancing geometric structure characteristics of an object is characterized by comprising the following steps of: Step 1, data arrangement and cleaning are carried out on point cloud data, data noise points are reduced, and redundant points are removed; Step 2, constructing an indoor point cloud semantic segmentation network model for enhancing the geometric structural characteristics of the object; step 3, inputting the point cloud data obtained in the step 1 into the network model obtained in the step 2 to perform point cloud semantic segmentation with enhanced local geometric structure characteristics; The network model adopts an encoder-decoder structure and consists of a point embedding layer, an LGE-Block module, a downsampling module and an upsampling module; adopting a single layer PVConv as a point embedding layer of the network model, and extracting initial geometric features of input point cloud data; The LGE-Block module consists of a layer normalization, an LGE-Former module and a feedforward neural network; the LGE-Former module is composed of a point vector convolution module and a window slice attention module, wherein the point vector convolution module is composed of three layers PVConv, and feature extraction and attention calculation are carried out on point cloud data; The specific operation of the point vector convolution module is that the point vector convolution module has the following steps of Of individual points Maintaining a point cloud at each point Is used as a center of the water tank, Constructing local neighborhood with radius and shape of sphere The neighborhood contains all the satisfaction Is a point of adjacency of (2) Extracting central point features and neighborhood geometric structure features aiming at the neighborhood, and aggregating the central point features and the neighborhood features; The central point feature extraction uses a size of The weight matrix of the (2) performs linear transformation on the central point characteristics, and the specific definition is as follows: In the formula, Is a feature of the center point of the image, Is a weight matrix that extracts the center point features, Representing the level of the current PVConv in the point vector convolution module; The neighborhood feature is extracted by a central point Constructing a three-dimensional coordinate system for an origin, dividing a three-dimensional space into 8 space quadrants in positive and negative directions by using 6 unit orthogonal bases, and enabling arbitrary positions and direction vectors to be formed The direction vector is represented by three orthogonal bases of the corresponding quadrant of the 8 quadrants Projected onto three corresponding orthogonal bases, and direction vectors are calculated And the included angle between the orthogonal bases in each direction, respectively calculating the three directions by using three direction weight matrixes, and aggregating the characteristics of the three directions to obtain a direction vector Edge features of (2) Specifically defined as follows: In the formula, Is used to represent the direction vector Is a set of three orthogonal bases, the features of each direction using coefficients The polymerization is carried out and the polymerization is carried out, Representing the weights in three directions, Represent the first Adjacent points on the layer Is characterized by (2); Representing a direction vector; Representing a center point; representing a neighborhood point; The central point feature and neighborhood feature aggregation adopts maximum pooling operation and uses a distance function Edge alignment features And carrying out weighted aggregation, wherein the specific definition is as follows: In the formula, Is the size of the currently PVConv selected radius, Is the central point And neighborhood point The total characteristics are aggregated, and the specific definition is as follows: In the formula, Representing the geometric features of the neighborhood, The function of the distance is represented as such, Representing edge features; Expressed in terms of Is centered at Point set of radius, point to be obtained The base features and their neighborhood geometry features are input to the next layer PVConv in the point vector convolution module or to subsequent modules for further processing.
2. The indoor point cloud semantic segmentation method for enhancing object geometric structure features is characterized in that the LGE-Block module is specifically operated by firstly normalizing input features through a normalization layer, then extracting data local geometric structure features through a point vector convolution module in the LGE-Former module, learning point cloud long-distance features through a slice window attention mechanism by a window slice attention module, then adding output features and original features through residual connection to perform primary feature fusion, and finally performing nonlinear transformation through a feedforward network to further extract high-order features.
3. The method for enhancing object geometry features indoor point cloud semantic segmentation as recited in claim 2, wherein the window slice attention module is specifically operative to divide PVConv output point clouds into size The cube window is disassembled into 3 orthogonal two-dimensional planes which are marked as Splicing the attention calculation results along the dimension of the characteristic channel, and carrying out linear transformation on the spliced results to obtain final attention output; The attention calculating process is to set the two-dimensional window size as Point set usage contained in a window The representation is made of a combination of a first and a second color, Representing the order of the current window, the window comprising At a plurality of points, the number of the attention heads is The order of the heads is recorded as Firstly, carrying out calculation of query, key and value, wherein the specific definition is as follows: In the formula, Queries, keys and values representing windows, which are linearly transformed using a linear function, map the original channel number to Matrix multiplication is carried out on the query matrix and the transposed matrix of the key to obtain a similar matrix which is recorded as a matrix Meanwhile, combining the relative position coding features shared by the three two-dimensional planes, integrating the spatial position information of the point cloud, and using a Softmax function to perform similarity matrix And carrying out normalization calculation on the position coding information to obtain the attention weight of each point relative to the query point, wherein the specific definition is as follows: In the formula, For the feature dimension of the set of points in the window, In order to pay attention to the number of heads, And Is a combination of two position codes that can be learned, Represent the first Queries on the windows; Represent the first And multiplying the attention weight and the corresponding value to obtain the attention output of each head, wherein the specific definition is as follows: In the formula, Indicating the number of attention deficit(s), Representing the order of the current attention header, Representing the corresponding value in Plane surface Execution on a plane Combining the calculation results of the three attentions to obtain an output result, wherein the specific definition is as follows: In the formula, Respectively plane surfaces The result of the calculation of the attention of the three two-dimensional windows, Representing the stitching operation of the characteristic channels.
4. The method for enhancing the indoor point cloud semantic segmentation of an object geometric structure feature according to claim 3, wherein the downsampling module is specifically operated in a specified area In, first, according to the point cloud coordinates Sampling FPS using the furthest point to select a sampling point, then using Querying a neighboring point set of sampling points in the original point set by using a nearest neighbor algorithm KNN to obtain a grouping index, and simultaneously, characterizing the point cloud Finally, fusing the characteristics of the projected data by using maximum pooling according to the grouping index and outputting the data Inputting the down-sampled point cloud quantity into a subsequent network layer To be reduced to 。
5. The method for enhancing an object geometry feature in an indoor point cloud semantic segmentation as recited in claim 4, wherein the upsampling module is operable to first, decode the feature Projection is carried out through a normalization and linear combination layer, and then, the current point cloud coordinate is obtained And the point cloud coordinates of the previous stage Finally, adding the projected characteristics, the coordinate result obtained by interpolation and the characteristics of the encoder at the previous stage to obtain the characteristics of the decoder at the next layer 。
6. The method for indoor point cloud semantic segmentation for enhancing object geometric structure features according to claim 5, wherein the local geometric structure enhanced point cloud semantic segmentation in step 3 is implemented by modeling local structural features, extracting local point cloud features, and performing fine segmentation on the point cloud, and the specific segmentation process comprises the following steps: The method comprises the steps of taking point cloud data as input, designing an LGE-Net as a backbone network, adopting a structure of an encoder-decoder, taking a point vector convolution module as a point embedding layer of the backbone network, carrying out preliminary geometric feature extraction on the input point cloud data, connecting the LGE-Block after the point embedding layer, guaranteeing that local features are completely transferred to a subsequent module, carrying out feature extraction on the point cloud data through multi-layer downsampling and LGE-Block alternate processing, gradually recovering feature dimensions by the decoder, fusing point cloud features of different layers, carrying out semantic category prediction on the point cloud data by using a multi-layer perceptron MLP, and outputting a segmentation result.

Description

Indoor point cloud semantic segmentation method for enhancing geometric structural features of object Technical Field The invention belongs to the technical field of computer vision, and particularly relates to an indoor point cloud semantic segmentation method for enhancing geometric structural features of an object. Background Indoor point cloud semantic segmentation is one of research hotspots in the field of computer vision and is important for indoor scene understanding. Because the point cloud data has irregularity and disorder, and the indoor scene has the characteristics of high complexity, complex object types and large number of small objects, the efficiency of manually dividing the point cloud is low and the accuracy is low. Therefore, the efficient indoor point cloud semantic segmentation algorithm can greatly improve the segmentation level, saves manpower and material resources, and plays a fundamental guarantee role for indoor design, indoor control and indoor safety. Therefore, the method has important significance for researching the indoor point cloud segmentation algorithm. In recent years, with the increasing maturity of depth sensing technology and three-dimensional acquisition equipment and the development of computer technology, researchers at home and abroad propose a plurality of point cloud semantic segmentation methods based on depth learning, mainly including a multi-view-based method, a voxel-based method and a point-based method. The multi-view-based method is to project the point cloud data to two dimensions, and process the data by adopting a two-dimensional convolution network. In 2019, milioto et al proposed RangeNet ++, convert the input point cloud into a range image representation, perform full convolution network segmentation on the range image, and simultaneously propose a post-processing algorithm to solve the problems of discretization errors caused by projective transformation and fuzzy output results of the convolution neural network. The view-based method is easy to lose key information when processing large-scale point clouds, and has the problems of low calculation efficiency and high resource consumption. The voxel-based method uses a voxel grid to regularize the unordered and unstructured point cloud. In 2018, wang et al proposed Adaptive octree convolutional neural network Adaptive O-CNN, and voxelized point cloud data based on patch-guided Adaptive octree-like representation. When the voxel-based method is used for dividing point cloud data, original space information of the point cloud cannot be completely reserved, and the voxel method has the problem of high time and hardware cost. The point-based method is a method of directly processing points in point cloud data as inputs. In 2021 Zhao et al, pointTransformer proposed introducing a self-attention mechanism in the neighborhood of points, and using a multi-layer perceptron to encode the relative positions of the center point and its neighboring points to learn features. 2022 Su et al proposed DLA-Net, and learned the local feature representation of each point and its neighborhood by using a dual local attention block, effectively capturing the geometric information of the local region, and improving the segmentation accuracy by combining multi-scale feature fusion. The existing research work achieves a lot of achievements and developments, but the problem that the existing point cloud segmentation method has insufficient attention to local geometric structure information and causes the condition that indoor small object classification segmentation is easy to be wrong exists. Therefore, there is still a need in the art for introducing more ideas to achieve comprehensive and deep extraction of local features of point cloud data. Disclosure of Invention Aiming at the problems that the geometrical structural features of the point cloud part are not fully considered and the sensitivity to the local features is insufficient in the point cloud semantic segmentation method, the invention provides an indoor point cloud semantic segmentation method for enhancing the geometrical structural features of an object. In order to achieve the above purpose, the present invention adopts the following technical scheme: an indoor point cloud semantic segmentation method for enhancing geometric structural features of an object, the method comprising the steps of: Step 1, data arrangement and cleaning are carried out on point cloud data, data noise points are reduced, and redundant points are removed; Step 2, constructing an indoor point cloud semantic segmentation network model for enhancing the geometric structural characteristics of the object; The network model adopts an encoder-decoder structure and consists of a point embedding layer, an LGE-Block module, a downsampling module and an upsampling module; adopting a single layer PVConv as a point embedding layer of the network model, and extracting initial geometric features