CN-122023802-A - Modal-deletion-oriented lightweight self-insight fusion RWKV crack segmentation method and system

CN122023802ACN 122023802 ACN122023802 ACN 122023802ACN-122023802-A

Abstract

The invention relates to the technical field of computer vision and deep learning, and discloses a method and a system for splitting cracks by lightweight self-insight fusion RWKV facing mode deletion, comprising the following steps of identifying available modes of input data, executing feature extraction and channel calibration by utilizing a reconstruction feature generator, and generating a global reconstruction feature map; the feature extraction subsystem generates a sequence scanning strategy according to the global features, controls the introspection sensing unit to encode the image sequence through linear recursive computation, integrates multi-level features through the cross-modal interaction fusion module, and finally reconstructs and outputs a crack prediction graph step by step through a decoder. The method solves the problem of characteristic failure caused by modal deletion through a dynamic aggregation mechanism, and utilizes the self-adaptive scanning strategy and the block codebook technology to effectively capture the long-distance dependence of the crack while greatly reducing the model parameter and the calculation load, thereby improving the robustness of crack detection in a complex environment.

Inventors

SHI FAN
LIU HUI
JIA CHEN
CHENG XU
ZHENG JIANGPENG

Assignees

天津理工大学

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (10)

1. The lightweight self-insight fusion RWKV crack segmentation method for the modal deletion is characterized by comprising the following steps of: Identifying available mode indexes of input data, inputting the available mode data into a reconstruction feature generator, and executing feature extraction and channel calibration on the available mode data to generate a global reconstruction feature map; The feature extraction subsystem generates strategy distribution aiming at sequence scanning according to the global reconstruction feature map, controls an internal introspection sensing unit to encode an image feature sequence and outputs a multi-level feature map; the cross-modal interaction fusion module receives the multi-level feature map, and performs cross-modal interaction and integration on features of different modes to generate fusion features; and the decoder module receives the fusion characteristics, performs step-by-step up-sampling and reconstruction on the characteristics, and outputs a pixel-level crack segmentation prediction graph.
2. The method for splitting a lightweight self-insight fusion RWKV crack for a modal miss according to claim 1, wherein the process of generating the global reconstructed feature map by the reconstructed feature generator includes: extracting intermediate features by depth separable convolution for each available modality; performing channel calibration on the intermediate features to generate enhanced context features; Calculating gating coefficients of original input features, and carrying out weighted combination on the original input features and the enhanced context features by utilizing a self-adaptive gating fusion mechanism to obtain enhanced features; performing non-parameterized element-wise aggregation of all the enhancement features and generating the global reconstruction feature map by projective transformation.
3. The method for splitting a lightweight self-insight fusion RWKV crack for modal deletion according to claim 2, wherein the process of channel calibration includes: performing global average pooling operation on the intermediate features to generate a channel descriptor; processing the channel descriptor by one-dimensional convolution; Generating a channel weight vector through a Sigmoid activation function, and multiplying the channel weight vector with the intermediate feature element by element.
4. The modality-deletion-oriented lightweight self-insight fusion RWKV fracture splitting method of claim 1, wherein the process of generating the policy distribution by the feature extraction subsystem comprises: processing the global reconstruction feature map by using a convolution classifier, and extracting an un-normalized prediction score; sampling the prediction score by using Gumbel-Softmax heavy parameterization skills to generate the strategy distribution; And performing a maximum value indexing operation or a probability-based sampling operation on the strategy distribution to obtain a determined scanning arrangement index, wherein the scanning arrangement index is used for rearranging two-dimensional image features into a one-dimensional sequence.
5. The mode-loss oriented lightweight self-insight fusion RWKV fracture splitting method of claim 4, wherein the introspection sensing unit comprises a spatial mixing module that performs the steps of: performing gating modulation on input features of the space mixing module by using the global reconstruction feature map; mapping the modulated features into an acceptance vector, a key vector and a value vector by using a linear projection layer; rearranging the key vector and the value vector according to the scan arrangement index; Performing SI-WKV recursive computation on the rearranged sequence to generate an aggregate output vector; And performing inverse permutation operation to restore the spatial correspondence of the features, performing element-by-element multiplication on the aggregate output vector and the receptivity vector, and outputting spatial mixing features after linear projection layer transformation is output.
6. The mode-loss-oriented lightweight self-insight fusion RWKV fracture splitting method of claim 1, wherein the introspection sensing unit comprises a channel mixing module, the channel mixing module performing the steps of: mapping the input features of the channel mixing module to a hidden layer space, and generating intermediate features by applying a square correction linear unit activation function; Processing the intermediate feature by using a structure comprising two one-dimensional convolution layers, generating a channel attention weight and performing element-by-element multiplication with the intermediate feature; and performing layer normalization on the multiplied features, and generating channel mixed output through a gating mechanism.
7. The mode-loss-oriented lightweight self-insight fusion RWKV crack segmentation method according to claim 1, wherein the feature extraction subsystem adopts a block codebook linear layer structure when performing linear transformation operation; the block codebook linear layer structure comprises a learnable codebook vector and a static block diagram; The codebook vector stores a unique weight value which is allowed to exist, and the static block diagram stores an integer index pointing to the codebook vector; In the calculation process, the static block diagram is used as an address index, values are retrieved from the codebook vector to dynamically synthesize a weight matrix, or the retrieved values and input features are directly used for calculation.
8. The method for splitting a lightweight self-insight fusion RWKV crack for a modal miss according to claim 1, wherein the process of generating the fusion feature by the cross-modal interaction fusion module includes: Mapping each mode feature to a unified feature space through an embedded function, and performing average aggregation to form a primary mixed feature; Processing the preliminary mixing features by using a cross-modal attention module, wherein the cross-modal attention module is integrated with a spatial convolution and channel attention mechanism; And re-weighting the characteristic channels through the channel attention mechanism, and outputting the fusion characteristics.
9. The mode-loss-oriented lightweight self-insight fusion RWKV crack segmentation method according to claim 1, wherein the decoder module adopts a hierarchical cascade structure; for the current decoding scale, receiving decoding features from the previous stage and performing an up-sampling operation; splicing the up-sampled features and the fused features with the same scale in the channel dimension; The spliced features are fused through an updating module, and decoding features of the current scale are generated; And finally, mapping the decoded output into a binarized crack segmentation result through a convolution pre-measurement head.
10. The mode-loss-oriented lightweight self-insight fusion RWKV crack segmentation system is characterized by being applied to the mode-loss-oriented lightweight self-insight fusion RWKV crack segmentation method as set forth in any one of claims 1-9, and comprises the following steps: the reconstruction feature generator is used for identifying available mode indexes of input data, receiving the available mode data, executing feature extraction and channel calibration, and generating a global reconstruction feature map; The feature extraction subsystem receives the global reconstruction feature map, generates strategy distribution aiming at sequence scanning according to the global reconstruction feature map, controls an internal introspection sensing unit to encode an image feature sequence and outputs a multi-level feature map; the cross-modal interaction fusion module receives the multi-level feature map, and performs cross-modal interaction and integration on features of different modes to generate fusion features; And the decoder module is used for receiving the fusion characteristics, carrying out step-by-step up-sampling and reconstruction on the characteristics and outputting a pixel-level crack segmentation prediction graph.

Description

Modal-deletion-oriented lightweight self-insight fusion RWKV crack segmentation method and system Technical Field The invention relates to the technical field of computer vision and deep learning, in particular to a lightweight self-insight fusion RWKV crack segmentation method and system for modal deletion. Background In the field of infrastructure maintenance, surface crack detection of roads, bridges and dams is critical to ensuring public safety. Along with the development of sensor technology, multi-mode fracture splitting by utilizing multi-source data such as visible light, depth information or infrared thermal imaging has become a mainstream trend. The multi-modal data can provide complementary information that helps the algorithm more accurately identify crack features in the presence of uneven illumination or complex background. In practical engineering application scenarios, the sensor data is often subject to partial loss due to hardware failure, data transmission errors or environmental interference. The existing multi-mode segmentation network is mostly designed based on the assumption that all mode data are complete, and features are generally extracted in parallel by adopting a fixed multi-branch structure. When the mode is missing in the input data, the conventional processing mode is to directly fill the missing channel to zero or replace the missing channel by using a mean value, and the processing mode destroys the statistical property of the data, so that the network cannot effectively establish global context connection, and the robustness of the segmentation result is seriously affected. Furthermore, the fracture typically exhibits an elongated and continuous geometry, requiring the algorithm to have the ability to capture long-range dependencies. Traditional convolutional neural networks are limited by local receptive fields, so that the continuity of cracks is difficult to model in a global scope, and fracture of segmentation results is easy to occur. Although the method based on the transducer architecture solves the problem of global receptive field through a self-attention mechanism, the computational complexity of the method increases quadratically with the resolution of the image, so that the consumption of computational resources is huge, and the real-time processing requirement is difficult to meet. How to effectively capture global long-distance dependencies while maintaining linear computational complexity is a challenge faced by current technology. Meanwhile, the crack detection equipment is usually mounted on mobile terminals such as unmanned aerial vehicles, wall climbing robots and the like, and the computing capacity and the storage space of the equipment are very limited. The existing high-performance segmentation model is often accompanied by huge parameters and calculation amount, and is difficult to be directly deployed on edge computing equipment. While compression techniques such as model pruning or knowledge distillation exist, these methods often require complex post-processing or at the expense of partial detection accuracy. Therefore, the design of the lightweight model with low parameter number and low calculation load characteristic on the structural design level is of great significance for realizing the engineering landing of crack detection. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a lightweight self-insight fusion RWKV crack segmentation method and system for modal deletion, which solve the problems of incomplete feature extraction, high calculation resource consumption and difficult long-distance dependent modeling in the modal deletion scene of the conventional multi-modal crack segmentation technology. In order to achieve the purpose, the invention is realized by the following technical scheme that the lightweight self-insight fusion RWKV crack segmentation method for the modal deletion comprises the following steps: Identifying available mode indexes of input data, inputting the available mode data into a reconstruction feature generator, carrying out feature extraction and channel calibration on the available mode data, and generating a global reconstruction feature map; The feature extraction subsystem generates strategy distribution aiming at sequence scanning according to the global reconstruction feature map, controls an internal introspection sensing unit to encode an image feature sequence and outputs a multi-level feature map; the cross-modal interaction fusion module receives the multi-level feature map, and performs cross-modal interaction and integration on features of different modes to generate fusion features; And the decoder module receives the fusion characteristics, samples and reconstructs the characteristics step by step and outputs a pixel-level crack segmentation prediction graph. Preferably, the process of generating the global reconstruction feature map by the reconstruction featur