CN-121982317-A - Pavement crack semantic segmentation network model and method based on self-attention mechanism

CN121982317ACN 121982317 ACN121982317 ACN 121982317ACN-121982317-A

Abstract

The invention provides a pavement crack semantic segmentation network model and method based on a self-attention mechanism, relates to the technical field of computer vision and deep learning, and solves the problem of limitation of the existing pavement crack semantic segmentation method in light weight, high precision and adaptability. In the network model, an encoder part performs multi-scale feature extraction on an input pavement image and comprises an initial feature extraction module, a plurality of cascade downsampling modules and a plurality of improved bottleneck residual error unit modules which are connected in series behind each downsampling module, wherein the tail end of the bottleneck residual error unit module is further provided with a self-attention module, a decoder part performs gradual recovery on the spatial resolution of a feature map, and a feature enhancement module between the encoder part and the decoder part fuses deep semantic features and shallow detail features. The invention realizes high-precision and continuous segmentation of the slender crack target and improves the adaptability of the model to different road surface types and complex environments.

Inventors

TAO JIAN
ZHOU TONG
LI JIALONG
YUAN LAI
WANG QINGFENG
LIU QIULIN
HU HONGLI
GAO YIHANG

Assignees

中国五冶集团有限公司

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. The utility model provides a pavement crack semantic segmentation network model based on self-attention mechanism, characterized in that, the network model adopts encoder-decoder structure for carrying out pixel level segmentation to the pavement image of input so as to output pavement crack segmentation map, and the network model includes: The encoder part is used for carrying out multi-scale feature extraction on an input pavement image and comprises an initial feature extraction module, a plurality of cascade downsampling modules and a plurality of improved bottleneck residual error unit modules which are connected in series behind each downsampling module, wherein the tail end of the encoder part is also provided with a self-attention module; A decoder section for stepwise recovering the spatial resolution of the feature map, comprising a plurality of cascaded upsampling modules; And the characteristic enhancement module is arranged between the encoder part and the decoder part and is used for fusing deep semantic characteristics output by the encoder part with shallow detail characteristics from a front stage of the encoder and providing the fused characteristics to the decoder part.
2. The pavement crack semantic segmentation network model according to claim 1, wherein said encoder section comprises at least 3 serially concatenated downsampling modules, wherein 3 improved bottleneck residual unit modules are serially connected after the 1 st downsampling module, 3 improved bottleneck residual unit modules are serially connected after the 2 nd downsampling module, 12 improved bottleneck residual unit modules are serially connected after the 3 rd downsampling module, and said self-attention module is arranged after the improved bottleneck residual unit modules serially connected with the 3 rd downsampling module.
3. The pavement crack semantic segmentation network model according to claim 1, wherein the improved bottleneck residual unit module comprises a first branch for extracting local features, a second branch for expanding receptive fields and a direct-connection branch for maintaining input information, and output features of the first branch and the second branch are spliced and fused in a channel dimension and summed with the output features of the direct-connection branch after channel shuffling operation.
4. The pavement crack semantic segmentation network model according to claim 3, wherein the first branch comprises a1 x1 convolution layer, a first depth separable convolution group comprising a3 x1 depth convolution layer and a1 x 3 depth convolution layer, and a first channel attention module, the second branch comprises the 1 x1 convolution layer, a second depth separable convolution group comprising a3 x1 depth convolution layer and a1 x 3 depth convolution layer, and a second channel attention module shared with the first branch, the convolution kernel of each depth convolution layer in the second depth separable convolution group is configured as an unwrapped convolution kernel decomposed along a spatial dimension.
5. The pavement crack semantic segmentation network model according to claim 3, wherein the channel shuffling operation is used for regrouping and arranging channels of the feature map obtained after the first branch and the second branch are spliced and fused, and the first channel attention module and the second channel attention module are respectively used for generating corresponding channel weight vectors and multiplying the corresponding channel weight vectors with input features of the respective branches channel by channel so as to perform feature re-weighting.
6. The pavement crack semantic segmentation network model according to claim 1, wherein the self-attention module is a transform module based on a multi-head attention mechanism, the self-attention module introduces position coding information when mapping input features into query features, key features and value features, and the output of the self-attention module is processed with layer normalization through residual connection and added to its input features.
7. The pavement crack semantic segmentation network model according to claim 1, wherein the feature enhancement module comprises a spatial alignment sub-module and a channel re-weighting sub-module, wherein the spatial alignment sub-module performs spatial transformation on shallow detail features from a front stage of the encoder by adopting deformable convolution so as to align with deep semantic features, the channel re-weighting sub-module respectively processes the aligned shallow detail features and the deep semantic features by global average pooling and global maximum pooling to generate channel attention weights, respectively weights the shallow detail features and the deep semantic features, and finally performs splicing fusion on the re-weighted shallow detail features and the deep semantic features.
8. The pavement crack semantic segmentation network model according to claim 1, wherein the decoder section comprises up-sampling modules corresponding to the number of down-sampling modules in the encoder section, wherein the input of each up-sampling module is a fusion of the features output by the up-sampling module of the previous stage and the corresponding scale provided by the feature enhancement module, and the network model further comprises a projection output module arranged at the tail end of the decoder section and used for mapping the features output by the last up-sampling module into a binary segmentation map with the same size as the input pavement image through convolution operation.
9. A pavement crack semantic segmentation method based on a self-attention mechanism, which is characterized by adopting the pavement crack semantic segmentation network model as claimed in any one of claims 1-8, and comprising the following steps: S1, acquiring a pavement image to be processed, and preprocessing the pavement image; S2, inputting the preprocessed pavement image into an encoder part of the pavement crack semantic segmentation network model, sequentially extracting multi-scale features through an initial feature extraction module, a plurality of downsampling modules and a serial improved bottleneck residual error unit module, and performing global context modeling through a self-attention module arranged at the tail end of the encoder to obtain deep semantic features; S3, fusing the deep semantic features with shallow detail features from a front stage of the encoder through a feature enhancement module arranged between the encoder part and the decoder part to obtain multi-stage fusion features; s4, inputting the multi-stage fusion features into the decoder, gradually recovering the spatial resolution of the feature map through a plurality of up-sampling modules, and outputting a pavement crack segmentation map after being processed by a projection output module.
10. The pavement crack semantic segmentation method according to claim 9, further comprising the step of training the pavement crack semantic segmentation network model prior to inputting the pavement image into the pavement crack semantic segmentation network model: S21, obtaining a pavement image training set containing pixel-level crack marks, and carrying out data enhancement processing on training images in the pavement image training set, wherein the data enhancement processing comprises at least one of random overturn, rotation, scale scaling, random clipping, brightness and contrast adjustment, gaussian noise addition and local shielding; s22, inputting the enhanced training image into the pavement crack semantic segmentation network model for forward propagation to obtain a prediction segmentation result; S23, calculating a loss function value between the prediction segmentation result and the labeling result, wherein the adopted loss function is a weighted combination of a cross entropy loss function and a similarity loss function based on region overlapping degree; And S24, performing iterative optimization on parameters of the pavement crack semantic segmentation network model by using a back propagation algorithm according to the loss function value until the pavement crack semantic segmentation network model converges, and completing a training process.

Description

Pavement crack semantic segmentation network model and method based on self-attention mechanism Technical Field The invention relates to the technical field of computer vision and deep learning, in particular to a pavement crack semantic segmentation network model and method based on a self-attention mechanism. Background Pavement cracks are one of the most common disease forms in the road running process, and have the characteristics of small scale, irregular shape, strong spatial distribution continuity and the like. Under the long-term action of factors such as traffic load, moisture permeation, temperature difference change and the like, cracks are easy to further expand and evolve into structural diseases such as pits, sinkage and the like, and road safety and service life are seriously influenced. Therefore, timely and accurate detection of the pavement cracks is a key link in the road maintenance management. The traditional pavement crack detection mode mainly depends on manual inspection and an automatic detection method based on a traditional image processing algorithm. Manual inspection is low in efficiency and high in subjectivity, detection requirements of a large-scale road network are difficult to meet, and methods based on traditional algorithms such as edge detection and morphological operation are extremely sensitive to complex interference factors such as illumination changes, road shadows and oil stains, are insufficient in detection stability, are easy to produce false detection or omission detection, and have obvious limitations in practical application. With the development of deep learning technology, a semantic segmentation method based on a convolutional neural network is gradually introduced into the field of pavement crack detection, can realize pixel level identification of cracks, and shows stronger characteristic learning capability than the traditional method. However, the existing deep learning-based method still faces a series of problems in actual deployment and application, a crack target usually presents slender, bent and continuous structural characteristics, the spatial resolution of a feature map can be compressed by downsampling operation frequently adopted in a conventional network structure, subtle connectivity of the target is easily damaged, and an unexpected fracture occurs in a final segmentation result to influence the detection integrity. In addition, in order to pursue higher segmentation precision, the existing models are often designed to be complex, the parameter amount is huge, and the calculation complexity is high, so that the existing models are difficult to deploy on hardware platforms with limited calculation resources and storage resources, such as vehicle-mounted inspection terminals, unmanned aerial vehicles or mobile edge equipment. In order to solve the problem of model weight reduction, some studies have attempted to simplify the network structure to reduce the number of parameters, but such simplification often comes at the expense of the model's expressive ability to multi-scale features, resulting in a significant reduction in the detailed segmentation effect of cracks in a fine crack or in a complex background. The road materials in the real road scene are various, such as asphalt, cement concrete and the like, and the surface textures and the gray level distribution of the road materials are obviously different, so that the existing single model is difficult to keep stable and excellent adaptability on various different types of road surfaces, and the generalization capability is limited. Therefore, in practical application, the current automatic detection technology for pavement cracks still has a plurality of bottleneck problems to be broken through urgently, especially in the balance of precision, efficiency and universality. How to design a lightweight semantic segmentation model which can effectively keep continuity of a crack space structure, has high precision and low calculation cost, and can be well adapted to different road scenes, becomes an important research direction in the field, and is also a key for pushing development of intelligent maintenance management technology of roads. Disclosure of Invention The invention aims to solve the problems that in the existing pavement crack semantic segmentation method, crack fracture is caused by downsampling, model calculation is complex and difficult to be deployed in a lightweight mode, and high precision and good adaptability to different pavement scenes are difficult to be simultaneously maintained, so that a pavement crack semantic segmentation network model and a pavement crack semantic segmentation method based on a self-attention mechanism are provided. According to the invention, through the optimization design of the network structure and the core module, on the premise of ensuring smaller model parameter and higher calculation efficiency, the high-precision and continuous se