CN-121999210-A - Urban mobile laser scanning point cloud semantic segmentation method based on double-domain attention
Abstract
The invention provides a city mobile laser scanning point cloud semantic segmentation method based on double-domain attention, which comprises the following steps of firstly constructing a local spatial attention block, extracting local semantic information through a spatial self-attention mechanism, enhancing the perception capability of local features, secondly designing a global channel attention block, focusing on relevance modeling among feature channels to capture global context information, improving the expression capability of the model on complex scenes, further introducing an information flow control module, regulating an information transmission path through a gating mechanism, retaining key details and inhibiting redundant information interference, and finally constructing a complete semantic segmentation network based on the module to realize efficient semantic analysis of point cloud data. Compared with the existing point cloud segmentation method, the method has higher semantic recognition precision under the moving laser scanning point cloud of the urban scene, the extracted urban street structure is more complete and clear, and stronger adaptability and robustness are shown for the complex point cloud environment.
Inventors
- LUO ZIWEI
- JIANG JUN
- LIU XINYUE
- CAI ZIYANG
- WU WANRU
- QI HANYU
- MA YING
Assignees
- 武汉纺织大学
Dates
- Publication Date
- 20260508
- Application Date
- 20251226
Claims (10)
- 1. A city mobile laser scanning point cloud semantic segmentation method based on double-domain attention comprises the following steps: s1, performing point cloud data preprocessing on large-scale point cloud data generated by urban mobile laser scanning to obtain a training data set and a test data set; S2, introducing a local spatial attention block to improve a self-attention mechanism and enhance local features, and simultaneously designing a global channel attention block in a feature channel dimension to model inter-channel dependency and capture global features; S3, carrying out multi-scale feature fusion on the local features and the global features obtained in the step S2 to obtain double-domain attention features based on space dimension and channel dimension, and then transmitting the double-domain attention features to a classifier module for further optimization to finish semantic classification of point cloud; S4, training the semantic segmentation network formed by the S2-S3 by utilizing the training data set processed in the step S1 to obtain a final generated network model; and S5, testing the test data set obtained in the step S1 by utilizing the generated network model obtained through training in the step S4, and obtaining a point cloud prediction result.
- 2. The method of claim 1, wherein the processing of the local spatial attention block is two parts of local patch embedding and spatial attention; in local patch embedding, the original point cloud is processed through random sub-sampling to reduce the point density difference and reduce the network training calculation amount, and then each sampling point is processed Obtaining the nearest neighbor point set by adopting K neighbor Then calculating the relative positions of the center point and the neighborhood point through edge convolution : ; And connected with absolute coordinates to obtain local patches Then, a multi-layer perceptron MLP is adopted to aggregate the multi-layer perceptron MLP into a feature space with higher dimension to generate a local block for embedding ; Subsequently, the learned local blocks are embedded With feature vectors previously captured by the perceptron MLP Connecting and obtaining self-adaptive weights through MLP processing Then pass through SoftMax @ ) Normalization of the function to aggregate input feature embedding in the neighborhood to form enhanced local attention features ; Based on the above, an improved spatial self-attention mechanism is adopted according to the input characteristics Generating a query, a key and a value vector: ; wherein the query vector, the key vector and the value vector are all generated by MLP linear mapping of the same input feature, and each linear mapping module is identical in structure and independent in parameter; In addition, a nonlinear maximum symmetric function integrated into the relative position is combined to enhance the robustness of unordered point cloud feature aggregation, so that the local semantic learning is enhanced, and the function is defined as follows: ; wherein max represents the symmetric maximum aggregation function, and position embedding By relative positions Delivered to MLP, and then enhanced by trainable location embedding to effectively integrate location relationships into the model to get attention attempts : ; Wherein: Representing dot product operations, and then local semantic data is passed through a self-attention mechanism and attention attempt Merging, generating local attention features Finally, generating fusion features through the MLP layer, wherein the final local features of the local spatial attention block are expressed as The following is shown: ; ; In the process, softMax% ) The function is used for the normalization, Representing a computer system having a learnable parameter Is shared with the multi-layer perceptron.
- 3. The method of claim 2, wherein generating a local block insert The process of (2) is shown in the following formula: ; ; Wherein: The connection operation is represented by a number of steps, Representing a computer system having a learnable parameter Is a shared MLP.
- 4. The method of claim 2, wherein the weights are adaptive The calculation formula of (2) is as follows: ; local attention feature The calculation formula of (2) is as follows: 。
- 5. The method of claim 2, wherein constructing the global channel attention block based on the structure of the U-Next pyramid comprises: Features from different scales and different propagation paths of the U-Next pyramid, including transverse propagation features, top-down propagation features and bottom-up propagation features, are spliced in channel dimensions first, and feature encoding is performed through channel mapping operations to obtain multi-scale fusion features On the basis, the multi-scale fusion feature is mapped by three groups of channels with similar structures and independent parameters to obtain a query matrix Key matrix Sum matrix These matrices are used to construct channel correlation weights and complete cross-channel information aggregation, generating channel-level attention features, noted as It is defined as follows: ; ; Wherein: Is the number of multiple channels used to fuse features, As a final local feature of the device, Representing a computer system having a learnable parameter Is a shared multi-layer perceptron of SoftMax ) The function is used for normalization.
- 6. The method of claim 1, wherein step S2 further comprises processing the output of the global channel attention block by an information control flow module as follows: ; ; wherein Gating represents a Gating mechanism unit, For the GELU non-linear active layer, The representation of the element multiplication is such that, Is a global channel feature.
- 7. The method of claim 1, wherein the standard cross entropy loss function L is used in training: ; Wherein L represents cross entropy, N represents the total number of samples, K represents the class number of samples; The one-hot coding of the sample target value is that 1 is taken if the real category of the sample i belongs to c, otherwise 0 is taken, and the probability that the model prediction sample i belongs to the category c is that the value is within (0, 1).
- 8. The method of claim 1, wherein after step S5, the extraction result obtained in step S5 is evaluated, the total accuracy OA and the average cross ratio mIoU are used as evaluation indexes, and compared and analyzed with other model algorithms from the two layers of evaluation index values and visual results, and the definition of OA and mIoU is as follows: ; ; Wherein K is expressed as the number of categories, TP i is expressed as the number of the correct points predicted in the ith category, N is expressed as the total number of all the points, mIoU is used for measuring the overlapping degree of the prediction and the real label, ioU is calculated and then averaged for each category, and FP i is expressed as the first category Points in a class that are mispredicted as a class, FN i represents points that are true as a class but are predicted as other classes.
- 9. The city mobile laser scanning point cloud semantic segmentation system based on the double-domain attention is characterized by comprising a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions in the memory to execute the city mobile laser scanning point cloud semantic segmentation method based on the double-domain attention according to any one of claims 1-8.
- 10. A computer readable storage medium, comprising a readable storage medium having a computer program stored thereon, which when executed, implements the two-domain attention-based urban mobile laser scanning point cloud semantic segmentation method according to any of claims 1-8.
Description
Urban mobile laser scanning point cloud semantic segmentation method based on double-domain attention Technical Field The invention relates to the field of three-dimensional point cloud information processing, in particular to a method for semantic segmentation of urban mobile laser scanning point cloud based on deep learning. Background The mobile laser scanning system has become a key technology for acquiring three-dimensional point cloud data of cities by virtue of flexibility and high precision, and is widely applied to the fields of urban mapping, automatic driving, infrastructure management, building information modeling and the like. The fine semantic segmentation of the point cloud data, namely, each point is correctly classified into ground object categories such as buildings, vehicles, pedestrians, vegetation and the like, is the foundation and the premise for realizing the advanced application. However, the urban mobile laser scanning point cloud has the characteristics of huge scale, uneven density, complex structure, large amount of shielding and the like, and the semantic segmentation task faces serious challenges. In recent years, deep learning technology, particularly based on a attention (transducer) architecture, has shown great potential in the field of point cloud processing due to its strong long-distance dependence modeling capability. The current point cloud segmentation method based on the attention (transducer) architecture mainly evolves along two directions, namely, the method is used for constructing hierarchical attention, capturing multi-scale features by combining downsampling and self-attention, and the method is used for improving local geometric perception, for example, combining graph convolution and offset attention to enhance feature robustness. These approaches have made significant progress in point cloud classification and segmentation tasks. However, there are significant drawbacks to applying the conventional attention-based (transducer) architecture directly to urban-level mobile laser scanning point cloud semantic segmentation. Firstly, the calculation complexity is high, the calculation cost of the global self-attention mechanism increases in a square level along with the size of the point cloud, and massive urban point cloud data is difficult to process efficiently. And secondly, local detail is lost, namely when the global context is focused, local fine structures which are critical to the segmentation result, such as geometric and semantic information of small targets like telegraph poles and signboards, are easy to ignore, so that the model has weak detail holding capability and low recognition precision of the small targets in a complex urban scene. Therefore, how to accurately maintain and enhance local detail features while effectively capturing global scene semantics is a core difficulty to be broken through in the prior art. Disclosure of Invention The invention aims to solve the technical problems that the existing point cloud segmentation method based on deep learning has the defects of easy loss of local details, poor small-scale target segmentation effect, difficult effective coordination of global and local features and the like in urban mobile laser scanning data, so that the invention provides the urban mobile laser scanning point cloud semantic segmentation method based on double-domain attention. Aiming at the problems when the existing deep neural network performs the semantic segmentation of the urban scene mobile laser scanning point cloud, the invention introduces a double-domain attention-based semantic segmentation network of the urban mobile laser scanning point cloud, is good at the semantic segmentation of the urban mobile laser scanning point cloud, and saves scene details, such as small-sized urban objects, in space and channel dimensions by a double-management strategy. First, in a local spatial attention block, spatial intelligent attention is used to observe local patterns, thereby obtaining enhanced self-attention containing extensive local semantic cues. The block also transmits local spatial information in parallel from each input embedded different representation subspace, enhancing its ability to learn local spatial semantics. Then, based on the feature pyramid framework of the fused and refined local spatial attention module, we introduce a global channel attention block that effectively captures global context by focusing on the interrelationship of the feature channels. In one aspect, our method utilizes a transducer of the spatial and channel domains to examine and fuse features from multiple layers, efficiently summarize semantic contexts, and enrich spatial detail with multi-scale information. On the other hand, we enhance semantic features by integrating them with the aggregated context and use gating mechanisms for selective information delivery. The technical scheme adopted for solving the technical problems is that t