CN-121999222-A - Lightweight network image segmentation method based on dynamic convolution

CN121999222ACN 121999222 ACN121999222 ACN 121999222ACN-121999222-A

Abstract

The invention relates to the technical field of image segmentation, and particularly discloses a lightweight network image segmentation method based on dynamic convolution. The method comprises the steps of obtaining an image, preprocessing the image to obtain an image dataset, constructing an encoder-decoder lightweight network based on dynamic convolution as an image segmentation model, inputting the image dataset into the image segmentation model to obtain a segmentation probability map, calculating loss of the segmentation probability map and a real label through a global edge optimization loss function, training model parameters according to the loss until the model parameters are converged, segmenting an input image by using the image segmentation model converged by the model parameters, and outputting the segmentation probability map. The method and the device can improve the precision and efficiency of image segmentation, accurately identify and segment the low-contrast area in the image while keeping the occupation of low computing resources, and remarkably improve the edge fracture phenomenon.

Inventors

ZHANG SHUFANG
ZHANG MEI
GUO JICHANG
Luo Xizhe
Zhan Jianghua
YANG QIANHUI
CHEN YUQIANG
WANG SHUAI

Assignees

天津大学
天津市儿童医院

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (5)

1. The lightweight network image segmentation method based on dynamic convolution is characterized by comprising the following steps of: s1, acquiring an image and preprocessing the image to obtain an image data set; S2, constructing an encoder-decoder lightweight network based on dynamic convolution as an image segmentation model; s3, inputting the image data set into an image segmentation model to obtain a segmentation probability map, calculating loss of the segmentation probability map and a real label through a global edge optimization loss function, and training model parameters according to the loss until the model parameters are converged; s4, dividing the input image by using an image division model with converged model parameters, and outputting a division probability map; the encoder-decoder light-weight network in S2 is composed of an encoder and a decoder, and the encoder and the decoder are of a six-layer structure; encoder first layer consists of 3 3, The second, third and fifth layers are composed of dynamic space information modules, and the fourth and sixth layers are composed of dynamic space information modules and dynamic channel pruning modules; the first layer of the decoder is composed of an output layer, the second layer and the sixth layer of the decoder are composed of a lightweight dynamic snake-shaped convolution module and an edge guiding up-sampling module, and the third layer to the fifth layer are composed of a jump connection, the lightweight dynamic snake-shaped convolution module and the edge guiding up-sampling module; the dynamic space information module is composed of a self-adaptive space convolution module, a dynamic snake-shaped convolution module and a light channel attention module; The global edge optimization loss function in the S3 is formed by combining global region overlapping optimization loss, edge perception constraint loss and deformation smoothness constraint loss, and the expression is as follows: ; Wherein, the Representing a global edge-optimized loss function, Representing a global region overlap optimization penalty, Representing the loss of edge-aware constraints, Representing deformation smoothness constraint loss.
2. The method for dynamic convolution-based light-weight network image segmentation according to claim 1, wherein, The self-adaptive space convolution module utilizes a dynamic convolution kernel to automatically focus on a target area, reduces space information loss, and has the expression: ; Wherein, the A feature map representing an input adaptive spatial convolution module, Representing the pixel locations in the input feature map, Representing the output characteristic diagram of the adaptive space convolution module in position The characteristic value of the position, Indicating the size of the convolution kernel, An index representing the sample point is shown, Indicating that the convolution kernel is at the first The weights at the points of the samples are, Representing convolution kernel The preset relative coordinates of the individual points, Represent the first The amount of positional shift of the individual sampling points, Indicating the actual sampling position after offset adjustment, Bilinear interpolation, converting floating point number in the actual sampling position after offset adjustment into integer, Representing a summation operation; the dynamic serpentine convolution module adjusts the sampling position of the convolution kernel through the learnable offset, so that the convolution kernel is adaptively deformed along the edge of the target, the flexibility of the convolution kernel is improved, and the expression is as follows: ; ; ; Wherein, the Representing the pixel locations in the feature map, Representing the output characteristic diagram of the adaptive space convolution module, Representing an extended range set of convolution kernels, Represented by Shafts or The position on the shaft is such that, And (3) with Representing the term of axial expansion, Is shown in The amount of offset on the shaft is such that, , And (3) with Representing the term of axial constraint offset, Bilinear interpolation, converting floating point number in the actual sampling position after offset adjustment into integer, Representing the position of the convolution kernel At the position of The weight on the axis of the shaft, Representing the position of the convolution kernel At the position of The weight on the axis of the shaft, Representing the summation operation, Representation of The output characteristics of the axial convolution set, Representation of The output characteristics of the axial convolution set, Representing the position of a dynamic serpentine convolution module The output characteristic diagram; The light channel attention module dynamically suppresses background noise based on grouping convolution to achieve the effect of light weight, and the expression is: ; ; ; Wherein, the Representing the total number of groups into which the characteristic channels of the input lightweight channel attention module are divided, Index of the representation group, ranging from 1 to , Is the first The channel weight of the group is determined, Is the first The channel weight of the group is determined, Is the first The channel weight of the group is determined, Is the first The channel weight of the group is determined, As a function of the Sigmoid, Feature map output for dynamic serpentine convolution module The group characteristics of the group of characteristics, Representing an average pooling operation, The operation of the splice is indicated and, The channel weight vector is represented as a vector of channel weights, A characteristic diagram representing the output of the dynamic serpentine convolution module, Representing an element-by-element multiplication, A feature map representing the light channel attention module output.
3. The method for dynamic convolution-based light-weight network image segmentation according to claim 1, wherein, The dynamic channel pruning module is positioned behind the dynamic space information modules in the fourth layer and the sixth layer of the encoder, dynamically closes the redundant channels according to the average activation intensity of each channel in the feature map, and performs branch reduction and light weight on the model, wherein the expression is as follows: ; ; ; Wherein, the Representative input feature map at the first The average activation intensity of the individual channels is, Representing the height of the feature map, Representing the width of the feature map, Representing an index in the height of the feature map, Representing an index over the width of the feature map, Representing the channel(s), Represents the first The channels are located at positions The absolute value of the upper characteristic value, Representing the summation operation, Representing a pruning mask of the tree, Representing a threshold, 1 representing a channel reservation, 0 representing a channel closure, Represents the first The characteristic value of the individual channels is set, Output of pruning module representing dynamic channel A map of the features on the individual channels, Representing element-by-element multiplication.
4. The method for dynamic convolution-based light-weight network image segmentation according to claim 1, wherein, The third layer to the fifth layer of the decoder are connected in a jumping way, each layer splices the feature map output by the current layer encoder and the up-sampling result of the upper layer of the decoder along the channel dimension to generate fusion features, and abundant feature bases are provided for subsequent refinement by integrating high-layer semantic information and low-layer space information details, wherein the expression is as follows: ; Wherein, the Represents the layer number, has the value range of 3,4 and 5, Represents the first The fusion characteristics of the layer generation, Representing the operation of the splicing operation, Represents the first The profile of the layer encoder output, Represents the first A feature map output by the layer decoder; The lightweight dynamic snake-shaped convolution module reduces the parameter by grouping convolution while recovering the image details, realizes the lightweight target, and has the expression: ; Wherein, the Representing the number of layers of the film, the value ranges are 1,2,3,4,5 and 6, Representing the position of the characteristic point, converting the non-integer position into an integer position by adopting a bilinear interpolation method, Represent the first A characteristic diagram output by the layer lightweight dynamic serpentine convolution module, Representing the total number of sample points, Represents the first The weights of the individual sample points are chosen, Represents the first A feature map of the layer is provided, For the preset coordinates of the sampling points, To learn the offset; Edge-guided upsampling module uses pre-extracted edge masks generated by Canny operators Constraining the reconstruction direction to further enhance the segmentation robustness of the low contrast region, the expression is: ; ; ; Wherein, the Representing the edge mask and the edge mask, Representing the number of layers of the film, the value ranges are 1,2,3,4,5 and 6, Representative downsampling to An edge mask after the layer resolution, Representing the high-level of the image, Representing the width of the image, 2 is the step size, In order for the downsampling operation to be performed, As an edge-sensitive weight matrix, For the feature map output by the edge-directed upsampling module Deconv x 2 is a transposed convolution of step 2, Represent the first A characteristic diagram output by the layer lightweight dynamic serpentine convolution module, Representing element-by-element multiplication; The output layer consists of a 1 x1 convolution and Sigmoid activation function, The Sigmoid activation function is used to generate a segmentation probability map, expressed as: ; Wherein, the Representing a graph of the probability of segmentation, As a function of the Sigmoid, Representative of Is used for the convolution kernel of (c), Representing a convolution operation and, Representing a feature map of the decoder second layer output.
5. The method for dynamic convolution-based light-weight network image segmentation according to claim 1, wherein, The global region overlapping optimization loss is used for maximizing the overlapping region of the prediction mask and the real label, and improving the global optimization segmentation precision, and the expression is as follows: ; Wherein, the Representing a global region overlap optimization penalty, Representing a graph of the probability of segmentation, Representing a real label and, Representing a summation operation; the edge perception constraint loss is used for enhancing the continuity of edges and strengthening the learning of low-contrast boundaries, and the expression is as follows: ; Wherein, the Representing the loss of edge-aware constraints, Representing the edge mask and the edge mask, Representing a graph of the probability of segmentation, Representing the total number of samples, Representing a logarithmic operation of the algorithm, Representing a summation operation; The deformation smoothness constraint loss is used for inhibiting the severe fluctuation of the dynamic serpentine convolution offset and avoiding the characteristic distortion caused by excessive deformation, and the expression is as follows: ; Wherein, the Representing the loss of the deformation smoothness constraint, The height of the feature map is indicated, The width of the feature map is represented, Representing the number of convolution kernels, Representing the index of the feature map in the horizontal direction, Representing the index of the feature map in the vertical direction, An index representing the convolution kernel, Is indicated in the position Convolution kernel The offset in the horizontal direction which is learned is, Is indicated in the position Convolution kernel The amount of the vertical shift learned is, Representing the euclidean norm calculation operation, Representing a summation operation.

Description

Lightweight network image segmentation method based on dynamic convolution Technical Field The invention relates to the field of image segmentation, in particular to a lightweight network image segmentation method based on dynamic convolution. Background In recent years, deep learning techniques typified by CNN have made breakthrough progress in image segmentation tasks, and have greatly advanced the development of the field of computer vision. However, as model performance increases, network architecture becomes increasingly complex, and the parameter scale and computational resource requirements also increase exponentially, which creates a significant challenge for practical deployment. Particularly, in resource-constrained environments such as mobile devices, embedded systems, edge computing, and the like, how to significantly reduce computing cost and memory occupation while guaranteeing network performance has become an important subject of current research. Therefore, lightweight networks featuring high performance, low resource requirements are receiving significant attention from both academia and industry. The lightweight network aims at performing efficient visual task processing under limited computing resources through innovative network structure design, optimization of convolution operators and efficient feature expression strategies. Classical lightweight networks include MobileNet series, shuffleNet series, and EFFICIENTNET series. MobileNet series reduce the model parameter and operand by the depth separable convolution greatly, realize the balance of calculation efficiency and accuracy, but have lower robustness for tasks needing to capture richer features, shuffleNet series introduce a channel shuffling mechanism to effectively enhance the information exchange among channels, but it depends on a fixed grouping strategy, which may not be applicable to all types of data and tasks, EFFICIENTNET proposes a compound scaling strategy, and by optimizing the network depth, width and input resolution simultaneously, a good balance of performance and calculation cost is obtained, but it increases the complexity of network design and super parameter adjustment, and more experiments and calculation resources are needed to determine the optimal configuration. Thus, despite the success of the currently prevailing lightweight network structures, they still suffer from significant shortcomings in handling low contrast areas in the feature map and spatial information between the different channels. In scenarios involving complex details, such as refined image segmentation tasks, lightweight networks often exhibit limitations in feature expression capabilities, making it difficult to efficiently capture and fuse spatially local detail information. Therefore, how to significantly reduce the demand of computing resources while enhancing the sensitivity and fine capturing capability of the network to spatial information remains an important problem faced by the current lightweight network image segmentation model. Disclosure of Invention Aiming at the defects in the prior art, the invention provides a design method of a dynamic convolution space information light-weight network, which is used for improving the precision and efficiency of image segmentation, effectively identifying and segmenting a low-contrast area in an image, remarkably improving the problem of edge fracture and realizing the light-weight design of the network. In order to achieve the purpose, the invention adopts the following technical scheme that the design method of the dynamic convolution space information light-weight network comprises the following steps: s1, acquiring an image and preprocessing the image to obtain an image data set; S2, constructing an encoder-decoder lightweight network based on dynamic convolution as an image segmentation model; s3, inputting the image data set into an image segmentation model to obtain a segmentation probability map, calculating loss of the segmentation probability map and a real label through a global edge optimization loss function, and training model parameters according to the loss until the model parameters are converged; s4, dividing the input image by using an image division model with converged model parameters, and outputting a division probability map; optionally, the encoder-decoder lightweight network in S2 is composed of an encoder and a decoder, which are both six-layer structures; encoder first layer consists of 3 3, The second, third and fifth layers are composed of dynamic space information modules, and the fourth and sixth layers are composed of dynamic space information modules and dynamic channel pruning modules; the first layer of the decoder is composed of an output layer, the second layer and the sixth layer of the decoder are composed of a lightweight dynamic snake-shaped convolution module and an edge guiding up-sampling module, and the third layer to the fifth layer are