CN-122023801-A - Image semantic segmentation method based on dynamic fusion and context awareness

CN122023801ACN 122023801 ACN122023801 ACN 122023801ACN-122023801-A

Abstract

The invention belongs to the technical field of computer vision, and particularly relates to an image semantic segmentation method based on dynamic fusion and context awareness, which comprises the following steps of inputting an input image into a lightweight multi-scale feature extraction backbone network to output shallow, middle and deep features; the method comprises the steps of inputting shallow layer, middle layer and deep layer characteristics into an MRFM module, outputting first, second and third enhanced semantic characteristics, inputting third enhanced semantic characteristics into LACP, outputting enhanced context characteristics, inputting the first and second enhanced semantic characteristics and the enhanced context characteristics into DWFM, outputting fusion characteristics, inputting the enhanced context characteristics into a dual-branch prediction structure, outputting a final segmentation prediction graph, constructing a joint loss function, and carrying out end-to-end training and optimization on a network formed by steps S1 to S5, thereby solving the problems of stiff fusion strategy, weak context perceptibility and redundant model parameters in the existing image semantic segmentation method.

Inventors

MU XIAOFANG
MA DEWANG
Bai Yuanming

Assignees

山西能源学院

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (8)

1. The image semantic segmentation method based on dynamic fusion and context awareness is characterized by comprising the following steps of: s1, constructing a light multi-scale feature extraction backbone network, inputting an input image to be segmented into the backbone network, and extracting and outputting shallow features, middle features and deep features; S2, constructing a multi-resolution feature braiding module MRFM, respectively inputting the shallow layer features, the middle layer features and the deep layer features into the MRFM, and generating and outputting first, second and third enhanced semantic features through step-by-step interactive fusion between parallel high-resolution paths and low-resolution paths in the MRFM; s3, constructing a lightweight context aggregation module LACP, inputting the third enhanced semantic features into the LACP, performing context feature enhancement processing through depth separable cavity convolution with different expansion rates in the LACP and global context pooling operation, and outputting enhanced context features; S4, constructing a dynamic weighted feature fusion module DWFM, inputting the first and second enhanced semantic features and the enhanced context features into the DWFM, and carrying out self-adaptive weighted fusion on the three enhanced semantic features through a global context sensing mechanism in the DWFM to generate and output fusion features; s5, constructing a double-branch prediction structure comprising a main branch and an auxiliary branch, inputting the fusion characteristic into the double-branch prediction structure, and outputting a final segmentation prediction graph after up-sampling and convolution operation of the fusion characteristic by the main branch; and S6, constructing a joint loss function for end-to-end training and optimizing the network formed by the steps S1 to S5.
2. The method for image semantic segmentation based on dynamic fusion and context awareness according to claim 1, wherein in step S1, the lightweight multi-scale feature extraction backbone network comprises an initial downsampling module Stem and a feature extraction stage connected in series, the feature extraction stage sequentially processes and outputs the shallow features, the middle features and the deep features, wherein the output of the corresponding shallow features keeps the original resolution of an input image, the output of the corresponding middle features and the deep features respectively reduces the resolution to 1/2 and 1/4, and the features output in each stage have the same channel number.
3. The method for image semantic segmentation based on dynamic fusion and context awareness according to claim 1, wherein in step S2, the multi-resolution feature weave module MRFM comprises three sub-modules connected in series, each sub-module comprising: a high resolution path for maintaining spatial resolution of the input features; A low resolution path having a fixed number of output channels and performing a downsampling operation; And performing feature interaction and fusion between the high-resolution path and the low-resolution path, and introducing a channel attention mechanism after fusion.
4. The method for image semantic segmentation based on dynamic fusion and context awareness according to claim 1, wherein in step S4, the dynamic weighted feature fusion module DWFM sequentially comprises: The dynamic weight calculation unit is used for generating a spatial and channel joint attention weight based on the global context information and weighting three input features; and the feature fusion unit is used for integrating and reducing the dimension of the weighted features by using convolution operation to generate the fusion features.
5. The method for image semantic segmentation based on dynamic fusion and context awareness according to claim 1, wherein in step S3, the lightweight context aggregation module LACP comprises: Three parallel depth separable cavity convolution branches, wherein the expansion rates of the branches are different; A global context branch, extracting global context information through self-adaptive average pooling and convolution operation; and after the outputs of the three depth separable cavity convolution branches and one global context branch are spliced in the channel dimension, integrating the outputs into the enhanced context feature through convolution operation.
6. The method of claim 1, wherein in step S5, the auxiliary branches of the dual branch prediction structure receive middle layer features from the backbone network as inputs and output auxiliary prediction results for providing additional supervisory signals during training to promote gradient propagation and accelerated convergence.
7. The method for image semantic segmentation based on dynamic fusion and context awareness according to claim 1, wherein in step S6, the joint loss function is a weighted sum of a cross entropy loss function and a Dice loss function.
8. The method for image semantic segmentation based on dynamic fusion and context awareness according to any one of claims 1-7, wherein the method is deployed on a mobile terminal or embedded equipment and is used for detecting and segmenting surface cracks of roads, buildings or bridges.

Description

Image semantic segmentation method based on dynamic fusion and context awareness Technical Field The invention belongs to the technical field of computer vision, and particularly relates to an image semantic segmentation method based on dynamic fusion and context awareness. Background The image semantic segmentation is used as a pixel-level classification task, and has wide application value in engineering scenes such as infrastructure crack detection, road disease identification and the like. The existing deep learning method mainly improves the segmentation precision by constructing a complex network, but brings about remarkable increase of the calculation cost, and is difficult to be deployed on a mobile terminal or embedded equipment. Current semantic segmentation methods face structural contradictions of resolution preservation and context awareness. On one hand, maintaining high resolution to preserve detail information easily causes insufficient receptive field and is difficult to model global semantic structures, and on the other hand, the semantic expression capability is improved through large-scale downsampling or deep structures, and detail loss of cracks is easily caused. In addition, the importance judgment of the static feature fusion strategy on the information of different scales is insufficient, and the fusion efficiency and the segmentation performance are further limited. Based on the above, it is necessary to invent an image semantic segmentation method with compact structure, strong context awareness and dynamic and efficient fusion mechanism, so as to realize the unification of segmentation precision and reasoning efficiency and adapt to the actual deployment requirement. Disclosure of Invention The invention provides an image semantic segmentation method based on dynamic fusion and context awareness, which aims to realize good balance of segmentation precision and reasoning efficiency and is particularly suitable for fine granularity detection tasks of crack targets, and aims to solve the problems of fusion strategy rigidness, poor context awareness and model parameter redundancy in the existing image semantic segmentation method. The invention is realized by adopting the following technical scheme: An image semantic segmentation method based on dynamic fusion and context awareness comprises the following steps: s1, constructing a light multi-scale feature extraction backbone network, inputting an input image to be segmented into the backbone network, and extracting and outputting shallow features, middle features and deep features; S2, constructing a multi-resolution feature braiding module MRFM, respectively inputting the shallow layer features, the middle layer features and the deep layer features into the MRFM, and generating and outputting first, second and third enhanced semantic features through step-by-step interactive fusion between parallel high-resolution paths and low-resolution paths in the MRFM; s3, constructing a lightweight context aggregation module LACP, inputting the third enhanced semantic features into the LACP, performing context feature enhancement processing through depth separable cavity convolution with different expansion rates in the LACP and global context pooling operation, and outputting enhanced context features; S4, constructing a dynamic weighted feature fusion module DWFM, inputting the first and second enhanced semantic features and the enhanced context features into the DWFM, and carrying out self-adaptive weighted fusion on the three enhanced semantic features through a global context sensing mechanism in the DWFM to generate and output fusion features; s5, constructing a double-branch prediction structure comprising a main branch and an auxiliary branch, inputting the fusion characteristic into the double-branch prediction structure, and outputting a final segmentation prediction graph after up-sampling and convolution operation of the fusion characteristic by the main branch; and S6, constructing a joint loss function for end-to-end training and optimizing the network formed by the steps S1 to S5. Further, in step S1, the lightweight multi-scale feature extraction backbone network includes an initial downsampling module Stem and a feature extraction stage connected in series, where the feature extraction stage sequentially processes and outputs the shallow features, the middle features and the deep features, where the output of the corresponding shallow features maintains the original resolution of the input image, the output of the corresponding middle features and deep features respectively reduces the resolution to 1/2 and 1/4, and the features output in each stage have a uniform channel number. Further, in step S2, the multi-resolution feature weave module MRFM includes three sub-modules connected in series, each sub-module including: a high resolution path for maintaining spatial resolution of the input features; A low resolution path havi