CN-121999459-A - Mamba architecture-based semantic alignment enhanced lane line detection method

CN121999459ACN 121999459 ACN121999459 ACN 121999459ACN-121999459-A

Abstract

The invention discloses a semantic alignment enhanced lane line detection method based on Mamba architecture. The method comprises the steps of obtaining a lane line data set and inputting the lane line data set into a backbone network for multi-level feature obtaining, flattening a feature map into a one-dimensional sequence by using a dynamic multi-scale self-adaptive screening module, stacking different levels of features, highlighting lane line information by weight, capturing short-distance dependence by global pooling and convolution through Mamba X branches, carrying out semantic replenishment on the output of the dynamic multi-scale self-adaptive screening module and the output of Mamba X branches by a semantic alignment fusion module, outputting alignment enhancement features, scanning effective features by a gating module in Mamba Z branches, establishing long-distance dependence by a scanning module in a position-by-position update state, inputting the feature sequence output by Mamba Z branches into a detection head, and outputting a lane line prediction map by continuously updating the position of a priori frame. The method combines dynamic multi-scale self-adaptive screening, mamba architecture and semantic alignment fusion, can establish long-distance dependency on the lane line, and improves the lane line detection precision.

Inventors

HU YUBO
ZHANG YUNZUO
CHEN SHUAI
LI JIACHENG

Assignees

石家庄铁道大学

Dates

Publication Date: 20260508
Application Date: 20260202
Priority Date: 20251124

Claims (5)

1. A semantic alignment enhanced lane line detection method based on Mamba architecture is characterized by comprising the following steps: s1, acquiring a lane line data set, inputting a picture into a backbone network for feature extraction, and obtaining a five-level feature map after feature extraction, wherein the five-level feature map is expressed as Where i represents the level of the feature, ; S2, optimizing local features of the i=3 and i=4 layers of feature graphs through a dynamic multi-scale adaptive screening module, wherein the module focuses on optimizing the local features, converting a two-dimensional feature graph into a one-dimensional sequence through flattening operation, retaining feature information of different scales, wherein a fine scale comprises an edge trend, a coarse scale comprises a global trend, up-sampling the one-dimensional sequence converted from the i=4 layers of feature graphs to ensure that the sequence lengths of two layers are consistent through up-sampling, and then feature stacking is carried out on the one-dimensional sequence converted from the i=3 layers of feature graphs to form features containing full-scale information, and finally, outputting the feature information as an enhanced one-dimensional feature sequence; S3, anchoring global semantics of a lane line and establishing long-distance dependence through a newly constructed Mamba network, wherein the network is divided into an X branch and a Z branch, the X branch is optimized for an i=5-layer feature map, channel alignment is carried out through 1×1 convolution, global semantics are extracted through global average pooling, the global semantics are broadcast to all sequence positions, the Z branch scans effective features through a gating module, short distances are captured through convolution, and then long-distance dependence is established through a scanning module in a position-by-position update state; S4, carrying out semantic alignment on the output of the dynamic multi-scale self-adaptive screening module and the output of the Mamba X branch through a semantic alignment fusion module, then converting the characteristics into a sequence priority format, calculating dot product similarity position by position, generating a pair Ji Quan by an activation function, correcting local characteristics by using weights, fusing the local characteristics with global semantic characteristics through addition, and finally normalizing the output alignment enhancement characteristics through MLP optimization and LayNorm; s5, inputting the characteristic sequence output by Mamba Z branches into a detection head Ha, wherein a represents a layer of the characteristic, ; S6, continuously updating the position Pb of the prior frame through iteratively fusing the multi-scale enhancement features, wherein b represents the optimization times, And finally outputting a lane line prediction graph.
2. The method for detecting the lane line based on the Mamba architecture and the semantic alignment enhancement type as claimed in claim 1, wherein the dynamic multi-scale adaptive screening module firstly flattens the feature map into a sequence, then unifies the sequence to the same sequence length, the i=3-layer feature sequence length is kept unchanged, the i=4-layer feature sequence length is converted into the unified length through an up-sampling mode, then multi-scale feature stacking is carried out, features with the unified length are spliced in channel dimensions, multi-scale information fusion is carried out, then weights are dynamically allocated to each sequence position through an attention mechanism to highlight effective features, then a position-by-position weight map is generated through convolution and an activation function, finally position-by-position weighting is carried out on the stacked features through the attention weights, and finally a one-dimensional feature sequence fused with the effective multi-scale information is output.
3. The method for detecting the lane line based on the Mamba architecture is characterized in that the Mamba dual-branch network structure takes i=5 layers of features as semantic features of dual-branch input, converts a feature map into a one-dimensional sequence from a 2D feature map, equally divides the two paths of input X branches and Z branches according to channels, the X branches focus on global semantic enhancement, firstly, performs dimension lifting operation to adapt to higher semantic capacity, then performs global averaging to aggregate sequence dimensions, extracts more abstract global semantic information, then spreads global semantic information to all sequence positions through a broadcasting mechanism, finally performs channel dimension reduction through 1X 1 convolution, enhances nonlinear expression through an activation function, and also expands the feature capacity through dimension lifting operation, then inputs a channel sequence into a gating vector to be used for screening high-level semantic related features, suppresses irrelevant semantic information, retains effective information, then uses 1D convolution to firstly establish short-distance semantic association, then performs selective scanning, firstly initializes the state, then performs convolution to update the state, then performs convolution to extract each position vector, and then stores all relevant feature values, and finally, collects relevant feature values, and finally, the method collects all relevant feature values.
4. The method for detecting lane lines based on Mamba architecture according to claim 1, wherein the semantic alignment fusion module mainly solves the problem of semantic misalignment between outputs of the two modules of the dynamic multi-scale adaptive filtering module and Mamba X branch, firstly converts a channel priority format into a sequence priority format, then calculates the similarity of local semantics and global semantics between each sequence position, and then compresses the similarity to between 0 and 1 through an activation function to obtain a position-by-position pair Ji Quan weight Finally by weight of Ji Quan And correcting local features in the dynamic multi-scale self-adaptive screening module, and outputting alignment features with strong semantic consistency and long-distance continuity.
5. The method for detecting lane lines based on Mamba architecture for semantic alignment enhancement as claimed in claim 1, wherein the training step of the trained lane line detection network comprises: Acquiring a lane line data set; Inputting a backbone network for feature extraction; Optimizing local characteristics through a dynamic multi-scale self-adaptive screening module; Global semantic feature extraction is carried out through an X branch of the improved Mamba network; enhancing the features through a semantic alignment fusion module; establishing long-distance dependence through a Z branch of the improved Mamba network; and outputting the lane line prediction graph through the detection head.

Description

Mamba architecture-based semantic alignment enhanced lane line detection method Technical Field The invention relates to a semantic alignment enhanced lane line detection method based on Mamba architecture, belonging to the technical field of computer vision. Background Along with the acceleration of the urban process and the rapid increase of traffic demand, the density of road networks is continuously improved, the quantity of motor vehicle maintenance is exponentially increased, and the demands for refinement and intellectualization of traffic management are increasingly urgent. The lane lines serve as core guide marks of road traffic order, and accurate detection results of the lane lines are not only key bases of automatic driving decisions and lane departure early warning of vehicles, but also basic data support of traffic control works such as traffic flow regulation and regulation violation identification. Therefore, the realization of high-accuracy lane line detection in complex and changeable actual road environments has become a core topic for promoting the intelligent traffic system to land and guaranteeing the road traffic safety and efficiency. Lane line detection is a basic task in the field of automatic driving, but multiple technical bottlenecks exist in an actual scene, and firstly, lane lines present perspective distortion of near-large and far-small in images, and traditional single-scale features are difficult to capture effective information between different distances. Secondly, in lane line detection, local features may be disjointed from global semantics, resulting in an increase in false detection rate. Finally, because the lane lines are continuous strip-shaped structures, the traditional CNN network has limited receptive fields, and the correlation of long-distance lane line segments is difficult to model. Mamba is used as an emerging sequence modeling tool, long-distance dependence of linear complexity is realized through selective scanning, but the single-branch design is difficult to balance the weight problem of global semantics and local details, but the calculation complexity of the single-branch design is obviously lower than that of a Transformer on the long-distance dependence modeling problem, and the single-branch design is suitable for lane line detection which has high real-time requirements. Disclosure of Invention The invention aims to solve the problems in the prior art and discloses a semantic alignment enhanced lane line detection method based on Mamba architecture. In order to achieve the above purpose, the technical scheme of the invention is as follows: A semantic alignment enhanced lane line detection method based on Mamba architecture is characterized by comprising the following steps: s1, acquiring a lane line data set, inputting a picture into a backbone network for feature extraction, and obtaining a five-level feature map after feature extraction, wherein the five-level feature map is expressed as Where i represents the level of the feature,; S2, optimizing local features of the i=3 and i=4 layers of feature graphs through a dynamic multi-scale adaptive screening module, wherein the module focuses on optimizing the local features, converting a two-dimensional feature graph into a one-dimensional sequence through flattening operation, retaining feature information of different scales, wherein a fine scale comprises an edge trend, a coarse scale comprises a global trend, up-sampling the one-dimensional sequence converted from the i=4 layers of feature graphs to ensure that the sequence lengths of two layers are consistent through up-sampling, and then feature stacking is carried out on the one-dimensional sequence converted from the i=3 layers of feature graphs to form features containing full-scale information, and finally, outputting the feature information as an enhanced one-dimensional feature sequence; S3, anchoring global semantics of a lane line and establishing long-distance dependence through a newly constructed Mamba network, wherein the network is divided into an X branch and a Z branch, the X branch is optimized for an i=5-layer feature map, channel alignment is carried out through 1×1 convolution, global semantics are extracted through global average pooling, the global semantics are broadcast to all sequence positions, the Z branch scans effective features through a gating module, short distances are captured through convolution, and then long-distance dependence is established through a scanning module in a position-by-position update state; S4, carrying out semantic alignment on the output of the dynamic multi-scale self-adaptive screening module and the output of the Mamba X branch through a semantic alignment fusion module, then converting the characteristics into a sequence priority format, calculating dot product similarity position by position, generating a pair Ji Quan by an activation function, correcting local characteristics by us