CN-121982511-A - Expressway pavement crack segmentation method and system based on multi-mode fusion and hypergraph convolution

CN121982511ACN 121982511 ACN121982511 ACN 121982511ACN-121982511-A

Abstract

The invention discloses a highway pavement crack segmentation method and system based on multi-mode fusion and hypergraph convolution, and belongs to the technical field of image processing and computer vision. The method comprises the steps of collecting RGB images of a road surface and text information corresponding to crack positions and trends, extracting basic image features, obtaining edge and texture features through an edge texture multi-scale feature extraction module, extracting text features through a Mamba text encoder, constructing a hypergraph structure by taking multi-modal features as hypergraph nodes, realizing cross-modal feature propagation and aggregation through hypergraph convolution, refining through a multi-modal feature fusion mechanism to obtain high-identification-degree crack features, and finally generating a segmentation graph through convolution and up-sampling. The invention effectively solves the problems of complex background interference, fine crack missing detection and insufficient model generalization capability, and key indexes on self-built and open data sets are superior to those of main stream models, thereby providing reliable technical support for automatic detection and maintenance of expressway cracks.

Inventors

XU TIANTIAN
ZHOU GUOXIONG

Assignees

中南林业科技大学

Dates

Publication Date: 20260505
Application Date: 20251105

Claims (10)

1. The highway pavement crack segmentation method based on multi-mode fusion and hypergraph convolution is characterized by comprising the following steps of: s1, acquiring a road surface RGB image to be segmented through shooting equipment, extracting text information describing the positions and the directions of cracks corresponding to the road surface RGB image, and carrying out standardized pretreatment on the road surface RGB image; S2, extracting image features of the preprocessed image, inputting the preprocessed image to an edge texture multi-scale feature extraction module, and outputting edge features and texture features; s3, inputting the text information in the step S1 to a text encoder based on Mamba architecture, performing sequence modeling through a selective scanning mechanism, and outputting text characteristics; s4, taking the edge features, the texture features and the image features output by the step S2 and the text features output by the step S3 as hypergraph nodes, constructing a hypergraph structure, performing cross-modal feature propagation and aggregation through a hypergraph convolution operation, performing optimization updating on the node features, and outputting optimization features fused with multi-modal information; S5, inputting the optimized features fused with the multi-modal information output in the step S4 into a multi-modal feature fusion mechanism module, and sequentially balancing modal weights, spatial attention mapping and residual connection through a cross-modal attention calculation and gating mechanism to output a high-resolution crack feature representation; And step S6, performing 1×1 convolution and upsampling processing on the characteristic representation output in the step S5 to generate a final pavement crack segmentation map.
2. The method according to claim 1, wherein the outputting of the edge feature by the edge texture multi-scale feature extraction module in step S2 comprises: Converting an input image into a gray scale map; And respectively adopting 3 multiplied by 3 and 5 multiplied by 5 Sobel operators to calculate the horizontal and vertical gradient responses, wherein the calculation formula is as follows: Wherein, the For 3 x 3 Sobel horizontal and vertical convolution kernels, For a 5 x 5 Sobel horizontal and vertical convolution kernel; According to the gradient amplitude formula A multi-scale edge map is generated, wherein, And The gradient values in the horizontal direction and the vertical direction are respectively, Is a very small constant; introducing Laplacian operator kernel Extracting high-frequency details of the image; And after edge features extracted by a 3X 3 Sobel operator, a 5X 5 Sobel operator and a Laplacian operator are spliced, inputting an edge refinement network consisting of a convolution layer, a batch normalization layer and a LeakyReLU activation function, and outputting edge features of pixel-level positioning accuracy.
3. The method according to claim 1, wherein the outputting of the texture feature by the edge texture multi-scale feature extraction module in step S2 comprises: aiming at the frequency difference of the pavement cracks and aggregate textures of the expressway, a3 multiplied by 3 and 5 multiplied by 5 double-branch convolution module which is arranged in parallel is adopted, each branch comprises two structures consisting of a convolution layer, a batch normalization layer and LeakyReLU activation functions, and texture information with different scales is respectively captured; channel splicing is carried out on the texture features output by the two branches; Realizing multi-scale texture fusion through 1X 1 convolution compression; Residual connection is introduced to avoid losing texture features in the convolution process.
4. The method according to claim 1, wherein in step S3, the processing procedure of the Mamba-architecture text encoder includes: the input text features are decomposed into two paths of features by an input projection layer, namely a main path x and a control path gate; the main path x is processed through a depth separable one-dimensional convolution, and the convolution kernel size is 4, so that local semantic association is captured; After being activated by Softplus, the dynamic generated attenuation coefficient delta is combined to participate in selective scanning, and the calculation process involves a state space parameter matrix A, B, C and state variable states, and the specific calculation process is as follows: Where t represents a time step, k represents a state index, Representing an input feature at time step t; The output y is multiplied by gate through a gating mechanism, is fused with the original features of the projection layer through jump connection, and the obtained text features are subjected to mean pooling and jointly act on optimization of segmentation decision together with image mode features.
5. The method according to claim 1, wherein in step S4, the specific method of the hypergraph convolution operation includes: To adapt post-dimensional unified image features Texture feature V 2 , edge feature V 3 , text feature V 4 as node set ; Constructing a hyperedge set connecting nodes of different modes And defining the superside incidence matrix Wherein when When the node i is represented to belong to the superside j; Based on superlimit By the formula Weighting and summing node characteristics to calculate superside characteristics ; Using a matrix of hyperedge weights Bias of The convolution operation carries out optimization updating on the superside characteristics so as to inhibit noise interference; Based on node degree The updated superedge characteristics Reversely aggregating to nodes, and obtaining the optimized characteristics of the fused multi-mode information by using the following node characteristic updating formula: 。
6. the method according to claim 1, wherein in step S5, the specific procedure of the multimodal feature fusion mechanism module includes: stacking the node characteristics after optimization and update output in the step S4 according to dimensions; The attention weight among modes is calculated through the cross-mode attention, and the formula is as follows Wherein The query and key matrix are respectively used, Is a feature dimension; Introducing a gating weight vector g, and utilizing a formula Performing feature fusion dynamic balance modal weight, wherein Sequentially pass through " "Treatment; Will be Generating a single channel spatial attention map by 1 x1 convolution Wherein Upsampling the image features and outputting the upsampled result with the original segmentation Multiplying element by element to obtain ; Finally, introducing residual connection, to And (3) with Weighting and fusing according to preset weights to obtain final refined features Where α is used to preserve the original output and β is used to enhance the output.
7. A highway pavement crack segmentation system based on multi-modal fusion and hypergraph convolution, characterized in that it is adapted to implement the method of any one of claims 1 to 6, the system comprising: The image acquisition module is used for shooting road surface RGB images; The data preprocessing module is used for receiving the road surface RGB image and the corresponding text information and carrying out standardized processing on the image; the image feature extraction module is connected with the data preprocessing module and is used for carrying out Sigmoid processing on the preprocessed image and extracting image features; the edge texture multi-scale feature extraction module is connected with the data preprocessing module and is used for extracting multi-scale edge features and texture features of the image; mamba a text encoder module, which is connected with the data preprocessing module and is used for encoding text information into text characteristics; the hypergraph convolution fusion layer construction module is respectively connected with the image feature extraction module, the edge texture multi-scale feature extraction module and the Mamba text encoder module and is used for constructing hypergraphs and executing hypergraph convolution fusion; the multi-mode feature fusion mechanism module is connected with the hypergraph convolution fusion layer construction module and is used for refining fusion features; The segmentation output module is connected with the multi-mode feature fusion mechanism module and is used for generating a final pavement crack segmentation map; the pavement crack characteristic detection module is connected with the segmentation output module and is used for extracting geometric characteristic parameters of the crack from the pavement crack segmentation map, wherein the geometric characteristic parameters comprise the length, the width, the area and the trend of the crack.
8. The system of claim 7, wherein the edge texture multi-scale feature extraction module specifically comprises: an edge extraction branch configured to perform gray scale conversion, multi-scale Sobel gradient computation, laplacian high frequency detail extraction, and edge refinement network processing; a texture extraction branch configured to perform multi-scale texture extraction and fusion based on 3 x 3 and 5 x 5 residual convolution branches; The hypergraph convolution fusion layer construction module is configured to: Constructing hypergraphs by taking image features, texture features, edge features and text features as nodes; Defining supersides and incidence matrixes connected with nodes of different modes; Executing calculation and update of the superside characteristics based on the superside degree; executing node characteristic feedback aggregation based on node degree; The system is integrated with a downstream pavement maintenance management system through an application programming interface or a file directory monitoring mode, and a crack segmentation map and quantization parameters generated by the segmentation output module are input into a database and are used for generating a pavement condition index report and planning a maintenance plan.
9. The system of claim 7, wherein the system provides a RESTful API or gRPC interface externally as a system interaction means, and wherein the downstream PMS system submits the image and obtains the segmentation result by calling the corresponding interface.
10. The highway pavement crack maintenance method based on multi-mode fusion and hypergraph convolution is characterized by comprising the following steps of: Performing crack detection on the expressway pavement by using the system of claim 7 to obtain a pavement crack segmentation map and crack geometric characteristic parameters; generating a pavement condition evaluation report containing crack positions, lengths, widths, areas, trend types and severity levels according to the crack segmentation map and the crack geometric characteristic parameters; planning a targeted maintenance plan based on the pavement condition evaluation report, and determining a crack area to be maintained, a maintenance priority and a recommended maintenance process; Guiding maintenance staff to carry out field maintenance operation on the expressway pavement crack according to the maintenance plan, wherein the maintenance staff comprises the steps of selecting filling material types according to the crack width, determining the sealing direction according to the crack trend and selecting the material consumption according to the crack area.

Description

Expressway pavement crack segmentation method and system based on multi-mode fusion and hypergraph convolution Technical Field The invention relates to the technical field of image processing and computer vision, in particular to a highway pavement crack segmentation method and system based on multi-mode fusion and hypergraph convolution. Background The highway is used as a core hub of the national comprehensive transportation system, and the road surface structural integrity directly determines traffic safety and operation efficiency. According to the road maintenance statistical data of the traffic transportation department, if the road surface cracks are not repaired in time, the road surface cracks can be rapidly expanded under the coupling effect of the vehicle load and the environmental factors, so that the maintenance cost is exponentially increased. Taking a bidirectional four-lane expressway as an example, the repair cost of road surface diseases caused by cracks is 3-5 times higher than that of early intervention, and meanwhile, the traffic efficiency of a road section is reduced by 15% -20%, and the accident rate of night driving is improved by 22%. The traditional pavement crack inspection relies on manual step inspection or vehicle-mounted visual inspection, so that the average daily inspection mileage is only about 1km, the detection result is affected by subjective factors such as inspection personnel experience, fatigue degree and the like, the crack identification consistency is lower than 60%, and the accurate requirement of highway networking maintenance is difficult to meet. In recent years, the automatic detection technology improves the detection efficiency, but a visual-based crack segmentation method still faces core challenges in algorithm level, namely, a first-generation traditional image processing method (such as Otsu threshold value and Canny edge detection) has remarkably reduced robustness under complex backgrounds of road marking reflection, repairing material color difference and the like, the false detection rate is more than 40%, a second-generation statistical learning method (such as SVM and random forest) is limited by manual feature expression capability, the cross-scene generalization error is more than 25%, and a third-generation deep learning method (such as U-Net series, a transducer architecture and DeepLabV3 +) has excellent performance on a specific dataset, but the method still faces three core challenges in landing to a highway engineering scene: (1) The complex background strong interference is that marked lines are high in reflection, patch color difference, aggregate texture, old cracks, scratches and the like are highly similar to cracks in gray level/texture, so that false analysis is high; (2) The thin cracks are easy to lose, namely the pixel ratio of the longitudinal hairline cracks with the width of less than 2mm is extremely low, and deep coding is easy to submerge shallow details, so that omission is caused; (3) The cross-scene generalization is insufficient, and the landform, illumination and weather change are obvious (strong light in noon/reverse light at a tunnel portal/rain fog reflection and the like), so that a model trained by a single data set is difficult to stably migrate. When the existing models (such as CrackFormer, HC-Unet ++, BC-DUnet) are used for coping with the coexistence of multiple types of interference such as 'vehicle shadow, pavement marking, repair mark' and the like in a highway scene, the suppression capability of the existing models on interference features is insufficient, and the existing models are still limited in adaptability to extreme imaging interference such as 'crack blurring caused by long-distance shooting, feature distortion caused by strong light reflection' and the like. Disclosure of Invention Aiming at the defects existing in the prior art, the invention provides a highway pavement crack segmentation method and system based on multi-mode fusion and hypergraph convolution. The invention aims to systematically solve three core challenges in the prior art by integrating four large core modules, namely, firstly, designing an edge texture Multi-scale feature extraction module (Multi-Scale Feature Extraction of Edge Texture, MFET for short) aiming at the problem of complex background strong interference, effectively distinguishing cracks from aggregate textures, marking lines and other interferences through Multi-scale gradient and residual texture extraction, secondly, introducing a text encoder (Mamba-Based Text Encoder, MBTE for short) and a hypergraph convolution fusion layer (HYPERGRAPH CONVOLUTION-Based Fusion Layer Construction, HFLC) based on Mamba architecture aiming at the problems of easy loss of the cracks and insufficient cross scene generalization, and utilizing the text semantic guidance and high-order correlation of the hypergraph structure to strengthen the expression of the characteristics o