CN-121811418-B - Lightweight multi-branch semantic segmentation method and system for complex road scene

CN121811418BCN 121811418 BCN121811418 BCN 121811418BCN-121811418-B

Abstract

The invention discloses a lightweight multi-branch semantic segmentation method and a lightweight multi-branch semantic segmentation system for a complex road scene, which belong to the field of computer vision and comprise the steps of taking an input image as a starting point, extracting multi-scale feature representation through a lightweight backbone network, anchoring a semantic core ASK branch to construct global semantic requirements on low-resolution features, executing pixel-level semantic prediction on the middle-resolution features by a semantic core FSK branch, detecting a boundary and an abnormal region by a boundary and outlier detection FOD branch on the high-resolution features, carrying out gating fusion on the ASK and the FSK features by a Qi-GFM gating fusion module, carrying out semantic Negotiation by a Qi-linkage semantic Negotiation module in combination with FOD output, and outputting optimized semantic segmentation results.

Inventors

JIANG LINGYUN
Su Xingzheng
XU JIA

Assignees

南京邮电大学

Dates

Publication Date: 20260512
Application Date: 20260306

Claims (5)

1. The lightweight multi-branch semantic segmentation method for the complex road scene is characterized by comprising the following steps of: firstly, taking an input image as a starting point, and extracting multi-scale characteristic representation through a lightweight backbone network; Step two, anchoring the semantic core ASK branch to construct global semantic requirements on the low-resolution features; Executing the semantic core FSK branch to execute pixel-level semantic prediction on the medium resolution feature; detecting boundary and abnormal areas on the high-resolution feature by the boundary and outlier detection FOD branch; fifthly, performing gating fusion on the anchor semantic core ASK and the execution semantic core FSK features through a Qi-GFM gating fusion module; Step six, performing semantic Negotiation by combining the boundary and the outlier detection FOD branch output through a Qi-registration semantic Negotiation module; Outputting the optimized semantic segmentation result; Let the boundary and outlier detect FOD branch input features be expressed as The boundary attention map is generated by a convolution map and Sigmoid activation function as shown in equation (4): (4), Wherein the method comprises the steps of The Sigmoid function is represented as a function, Representing the function of the ReLU activation, Boundary attention map For characterizing uncertainty and abnormal response intensity of semantic predictions at different spatial locations, A batch normalization is shown and is performed, Representing a3 x 3 convolution; set the ASK branch feature of the anchoring semantic core as Executing semantic core FSK branch features as As shown in formula (5), first, channel mapping is performed on two types of features to unify feature space: (5), Wherein the method comprises the steps of Representation of The convolution operation, then, as shown in equation (6), is performed at the boundary attention Under the guidance of (1), carrying out negotiation fusion on the anchor semantic core ASK and the execution semantic core FSK characteristics: (6), Wherein the method comprises the steps of Representing element-by-element multiplication; Detection of FOD Branch feature by boundary and outlier first pass The convolution maps to a uniform channel space as shown in equation (7): (7), Wherein, the Representing channel mapping operation, and then fusing the mapped boundary and outlier detection FOD branch characteristics with the characteristics obtained by negotiation Residual fusion is performed as shown in formula (8): (8), The residual fusion mode is to introduce category abnormal features at the boundary as supplementary items, and finally, further integrate the fused features through convolution, batch normalization and ReLU activation to obtain the final output of the Qi-registration module, as shown in a formula (9): (9)。
2. The method of claim 1, wherein step five comprises the steps of: the Qi-GFM gating fusion module takes two types of features of an anchoring semantic core ASK and an execution semantic core FSK as input, and sets the features from an anchoring semantic core ASK branch as the representation of the features Features from the execution semantic core FSK branch are expressed as The two types of features are spliced in the channel dimension to obtain a combined feature as shown in formula (1): (1), subsequently, spatial gating weights are generated by convolution mapping as shown in equation (2): (2), Wherein the method comprises the steps of Representing the Sigmoid activation function, Is a pixel level gating map.
3. The method of claim 2, wherein step five comprises the steps of: the Qi-GFM gating fusion module performs weighted fusion on the features from the anchoring semantic core ASK and the executing semantic core FSK branch through gating weights, and the specific fusion method is shown in the formula (3): (3), Wherein the method comprises the steps of Representing element-by-element multiplication.
4. The method according to claim 1, characterized in that the method comprises: the multi-branch semantic segmentation network QiNet outputs multiple intermediate predictions in the second, third and fourth stages, respectively, and applies supervisory signals to these outputs in the training stage, overall loss As shown in formula (10): (10), Wherein the method comprises the steps of For the main semantic output loss, corresponding to the final prediction result of the fourth stage, the output integrates the anchoring semantic core ASK, executes the Negotiation result of the semantic core FSK and the boundary and outlier detection FOD branch in the Qi-registration module, represents the final semantic segmentation prediction of the model on the input image, For boundary constraint loss, for assisting the FOD branch to learn boundary information effectively, To assist in semantic loss, to assist in efficient learning of FSK branches, To assist semantic loss, it is used to assist the effective learning of FOD branches.
5. The system for the lightweight multi-branch semantic segmentation method for the complex road scene as claimed in any one of claims 1 to 4, which is characterized by comprising an anchoring semantic core module, an execution semantic core module, a boundary and outlier detection branch module, a Qi-GFM gating fusion module and a Qi-Negotiation semantic Negotiation module, wherein the anchoring semantic core module is used for modeling global and cross-scale semantic requirement information, the execution semantic core module is used as a semantic execution body, fine-granularity pixel-level semantic discrimination is completed on multi-scale features, the boundary and outlier detection branch module detects semantic boundaries and abnormal response areas, the Qi-GFM gating fusion module performs collaborative modeling on features from an anchoring semantic core ASK branch and an execution semantic core FSK branch through introducing a spatial adaptive gating mechanism, and the Qi-Negotiation semantic Negotiation module performs unified and fusion on semantic requirement information, semantic execution information and abnormal detection information.

Description

Lightweight multi-branch semantic segmentation method and system for complex road scene Technical Field The invention belongs to the field of computer vision, and particularly relates to a lightweight multi-branch semantic segmentation method and system for a complex road scene. Background Semantic segmentation is taken as a basic task in the field of computer vision, aims at carrying out accurate semantic annotation on each pixel in an image, and plays a key role in applications such as intelligent transportation systems represented by automatic driving, robot environment perception and the like. The full convolution network (Fully Convolutional Network, FCN) replaces the full connection layer with the convolution layer, so that end-to-end pixel level prediction is realized, a unified semantic segmentation network paradigm is established for the first time, and the problem that the traditional method relies on manual characteristics and complex post-processing is effectively solved. However, the continuous downsampling operation of FCNs results in significant degradation of feature resolution, loss of spatial detail, and boundary blurring. In order to enhance the modeling capability of the Network on global context information, a pyramid scene analysis Network (PYRAMID SCENE PARSING Network, PSPNet) aggregates context features under different receptive fields through multi-scale pooling, so that the semantic ambiguity problem in a complex scene is effectively relieved. DeepLabv3+ is combined with the encoding-decoding structure through cavity separable convolution, so that the receptive field is enlarged, meanwhile, the higher spatial resolution is kept, and the problem of information loss caused by feature downsampling is improved from the structural level. However, this type of method relies on complex context interaction structures, which have large computational overhead and still present certain application difficulties in real-time or resource-constrained scenarios. In terms of context semantic modeling, OCRNet enhances semantic consistency by introducing object-level context representation modules to explicitly model relationships between pixels and object semantics. Researchers have further proposed context prior-based scene segmentation methods that guide semantic predictions by learning co-occurrence relationships between classes. Researchers have also proposed a Semantic Flow mechanism to promote Semantic consistency by propagating Semantic information across spatial locations. However, such approaches focus on context-level semantic enhancement, and have not explicitly distinguished semantic execution from predicted abnormal behavior. In addition to context modeling, the ease of loss of fine target semantic information is also a significant problem in complex road scenarios. ENet, ERFNet, ICNet and BiSeNet series methods achieve a balance of speed and accuracy to some extent through lightweight design and multi-resolution modeling. However, most of the above approaches couple context modeling and semantic execution processes in a single inference path, lacking explicit decoupling of different semantic functions. In the aspect of object boundary and structure modeling, researchers improve the boundary quality of a segmentation result by introducing boundary modeling, gating shape branches and learning of an affinity relationship between pixels. However, such methods typically introduce boundary information as an auxiliary supervision, and the boundary and the prediction anomaly regions have not been treated as independent functional stages in the semantic reasoning process from the system level. In recent years, neural network research has begun to review prediction reliability issues in complex scenarios from the perspective of anomaly detection and uncertainty perception. Related studies have shown that explicit modeling of abnormal regions and decision uncertainty helps to promote overall stability of the learning system, but related methods have not been systematically incorporated into the multi-branch inference framework of semantic segmentation. At the network structure level, part of the research attempts to improve semantic segmentation performance through functional decoupling and multi-branch modeling. There are researchers who have proposed dynamic routing mechanisms to achieve adaptive selection of feature paths, and there are also researchers who systematically analyze the synergistic advantages of multi-branched neural networks from a structural perspective. In addition ResNet and SENet provide a general structural basis for deep feature modeling and channel attention mechanisms. Although the above method has made a certain progress in structural design, most of the methods still lack a unified collaborative decision mechanism, and it is difficult to simultaneously consider global consistency, detail fidelity and anomaly suppression in complex road scenes. Disclosure of Invention