CN-121686119-B - Sparse label enhancement method for local consistency guidance

CN121686119BCN 121686119 BCN121686119 BCN 121686119BCN-121686119-B

Abstract

The invention provides a sparse label enhancement method guided by local consistency, which is suitable for detecting a road drivable area and belongs to the technical field of images. The method aims at solving the problem of dependence of the existing deep learning model on a large number of pixel-level labeling data, firstly sparse labeling is carried out on an input image, context enhancement features are built according to local and global image representations, similarity relations among super-pixel nodes are built, then a label propagation model is built based on a graph convolution network, and sparse labels are propagated to unlabeled areas to generate pseudo labels. The method adopts a weak supervision training strategy guided by local consistency to design a joint loss function and performs collaborative supervision on marked areas and unmarked areas, so that the reliability and the overall segmentation precision of the pseudo tag are improved. Experimental results show that the method can be suitable for detection tasks of various road drivable areas, and the obtained high-quality pixel-level pseudo tag can be used for subsequent full-supervision model training.

Inventors

QIN HONGSHUAI
WU JIANFENG
Liu Fudan

Assignees

杭州电子科技大学

Dates

Publication Date: 20260508
Application Date: 20260211

Claims (7)

1. A local consistency guided sparse tag enhancement method, the method comprising the steps of: S1, labeling the data set in a sparse labeling mode, wherein only a small number of pixels or areas are selected from the image to serve as labeled samples, and the rest areas are kept in an unlabeled state; S2, constructing a context enhancement feature, namely firstly carrying out smoothing pretreatment on the input image marked in the step S1, secondly constructing an adjacency relation among super pixels based on feature similarity among pixels to form a super pixel image, and finally extracting self features of each super pixel node and calculating context difference features between the super pixel node and a neighborhood super pixel to form a node context enhancement feature for subsequent graph convolution learning; s3, constructing a label enhancement network, namely receiving the constructed characteristics of S2, constructing a graph convolution network by taking super pixels as graph nodes and taking the context enhancement characteristics of the super pixels as graph edges, and spreading sparse label information by utilizing the graph convolution network to enable labels to spread from marked super pixels to unmarked super pixels so as to generate preliminary pixel-level pseudo labels; S4, establishing a local consistency guided weak supervision training strategy, namely building a local consistency guided joint loss supervision function to train the label enhancement network constructed in the S3, wherein the label enhancement network comprises a pseudo-label local consistency supervision item and a deterministic supervision item of a marked area and a pseudo-label local consistency supervision item of an unmarked area, and the quality of the pseudo-label is improved by utilizing reliable sparse labels and reliable information of the pseudo-labels to a greater extent through iterative training, so that high-quality pixel-level pseudo-label generation is realized; the S4, the weak supervision training strategy of local consistency guidance includes: Establishing a local consistency guided joint loss supervision function, wherein the local consistency guided joint loss supervision function comprises supervision items of marked areas and supervision items of unmarked areas; Local consistency guided joint loss Supervision item for marked area And supervision of unlabeled regions Is a weighted sum of: ; Wherein, the Is a super parameter for balancing the contributions of the different losses; the supervision of the annotated regions uses partial cross entropy loss: ; Wherein, the The number of classes of index tabs is indicated, For a set of marked nodes, Is a node Belongs to the category of Is used to predict the label of a (c) tag, Is a node Belongs to the category of Is a real tag of (1); The supervision items of the unlabeled area are: ; Wherein, the 、 Is a super parameter for balancing the contributions of the different losses; Deterministic loss : ; Wherein, the Refers to a set of unlabeled nodes, Representing nodes In category The prediction probability of (a); local consistency loss Based on feature-space joint weights Similarity score to neighbor Monitoring the local consistency of the node and the neighbor thereof; Definition of the definition The feature-space joint weights are: ; Wherein, the Representing the context enhancement features of the node, Representing the spatial coordinates of the node points, And Is the scale of the Gaussian kernel, controls the similarity attenuation scale of the characteristic and the distance respectively, Is the L2 distance; Definition of the definition For neighbor similarity score: ; Wherein, the Is a node With its neighbors When the difference strength of (1) At a time close to the time of 0, Approximately 1, when When the value of the time is relatively large, Approximately 0; local consistency loss The method comprises the following steps: ; Wherein, the Is composed of nodes Is a set of neighbors of a center, Is an over-parameter that prevents the zero removal, Is the predictive label of the node and, Is the L1 distance.
2. The local consistency guided sparse tag enhancement method of claim 1, wherein labeling the sparse labeling of the dataset in S1 comprises: The annotation type of each segmentation category of each image has one or more sparse annotation forms with position information, namely points, lines or boxes.
3. The local consistency guided sparse tag enhancement method of claim 1, wherein the S2 context enhancement feature construction comprises the steps of: S21, smoothing pretreatment, namely carrying out smoothing pretreatment on the input image marked in the S1; S22, constructing a super-pixel image, namely generating a uniform and compact super-pixel segmentation image on the basis of the feature similarity among pixels of the sparse labeling image preprocessed in the S21 by adopting a multi-scale simple linear iterative clustering algorithm; s23, constructing the context enhancement features of the super-pixel nodes by using the position, color and texture information of the super-pixels generated in the S22 Maintaining self-characteristics, taking neighborhood context characteristics into consideration, and constructing with local difference degree of super pixels in neighborhood space And (5) maintaining the neighborhood characteristics.
4. The local consistency guided sparse tag enhancement method of claim 3, wherein the S21, smoothing pre-processing comprises: And smoothing the original image by adopting a multi-scale nearest neighbor operator based on a grid division technology.
5. The local consistency guided sparse tag enhancement method of claim 3, wherein said S22 construction of the superpixel image comprises: A modified simple linear iterative clustering algorithm is adopted to construct a super-pixel image, a group of pixels which have similar color and texture attributes and are adjacent in space are divided into super-pixels, and a multi-scale similarity measure is established to ensure the compactness and consistency of super-pixel segmentation.
6. The sparse label enhancement method of claim 3, wherein the context enhancement feature construction representation of the super pixel node comprises self features and neighborhood features, namely node context enhancement features, in S23 Represented as node self-features Neighborhood feature Is spliced by (1): ; Node self-feature matrix The weighted summation calculation of the position information, the color information and the texture information value is carried out to obtain: ; Wherein, the 、 And Respectively representing the position, color and texture information of the node, 、、 Weights of position, color and texture respectively; position information, namely storing the space coordinates of the nodes; Color information, namely storing color moments of a color space to represent color distribution in nodes, wherein the color distribution comprises mean, variance and skewness; Texture information, namely extracting texture features based on second-order statistics by using a gray level co-occurrence matrix, and calculating correlation between two gray levels to reflect spatial structure information about direction, interval and amplitude changes, wherein the texture features in four directions of 0 degree, 45 degree, 90 degree and 135 degree are extracted, and the texture features comprise homogeneity, contrast, dissimilarity, entropy, energy, inverse variance and correlation; Node neighborhood feature For measuring each node With neighboring nodes The local degree of difference in the feature space is expressed as follows: ; Wherein, the 、 Representing nodes respectively Is a self-feature of (2); Representing nodes Is a neighbor node set of (a); Representing nodes To reflect the contextual inconsistency of the node.
7. The local consistency guided sparse tag enhancement method of claim 1, wherein the S3 tag enhancement network construction comprises: The method comprises the steps of taking super pixels as graph nodes, taking the context enhancement features of the super pixels as graph edges, inputting the nodes and the edges into a graph convolution network formed by a plurality of graph convolution layers, wherein the graph convolution network is formed by a plurality of graph convolution layers and is used for learning local and global feature relations among the super pixels and carrying out label propagation and enhancement, and based on similarity relations of the nodes, sparse label information is propagated to the whole image through a graph convolution network to generate a pixel-level pseudo label.

Description

Sparse label enhancement method for local consistency guidance Technical Field The invention relates to the field of computer vision, in particular to a sparse label enhancement method guided by local consistency, a computer readable storage medium and electronic equipment. Background With the rapid development of intelligent traffic systems and autopilot technology, accurate road condition detection is becoming increasingly important. Deep learning-based methods have been widely used for road drivable region detection tasks, but most rely on fully supervised learning, requiring a large amount of accurate pixel-level annotation data. However, such dense labeling is not only time consuming and laborious, but also costly, due to the problems of aliasing, unclear boundaries of the road drivable area, etc. Although the weak supervision learning method based on sparse labels such as points, lines and boxes can reduce the labeling cost to a certain extent, the weak supervision learning method still has the following challenges in road drivable region detection tasks (1) the utilization rate of the sparse labeling information is insufficient, reliable information is difficult to fully extract from limited labeling information, (2) the quality of a pseudo label is unstable, the pseudo label generated by the conventional method possibly contains more noise, the model performance is influenced when the pseudo label is used for full supervision training, and (3) the model optimization dilemma is that noise is easy to introduce in the training process, so that model convergence is difficult, and detection accuracy is poor. Aiming at the technical problems, the invention provides a local consistency guided sparse label enhancement method, which propagates the sparse label to the whole image through effective utilization of sparse labeling information to finally generate a high-quality pixel-level pseudo label, thereby remarkably improving the detection precision and robustness of a follow-up full-supervision network. The method can obviously reduce the labeling cost, achieves the detection effect equivalent to that of a full supervision method, and has good application prospect. Disclosure of Invention The invention provides a sparse label enhancement method guided by local consistency, which aims to solve the problem of dependence of the existing full-supervision network on dense labels. The method has the core ideas that the context enhancement features are fused on the super-pixel graph structure, and the effective expansion of the sparse label is realized by combining with the weak supervision training strategy of local consistency guidance. Specifically, a super-pixel graph based on feature similarity is constructed, super-pixels are used as graph nodes, context enhancement features are used as graph edges, a graph rolling network (GCN) is introduced to propagate and enhance sparse labels, and meanwhile, a joint loss function is designed to combine supervision of marked areas with local consistency constraint and deterministic supervision of unmarked areas, so that sparse labels and reliable pseudo-label information are fully utilized, and high-quality pixel-level pseudo-labels are gradually generated. The method can remarkably improve the label utilization efficiency under the sparse labeling condition, thereby improving the detection performance under the weak supervision scene. In order to solve the technical problems in the prior art, the invention adopts the following technical scheme: a local consistency guided sparse label enhancement method comprises the following steps: s1, labeling the data set in a sparse labeling mode, wherein only a small number of pixels or areas are selected from the image to serve as labeled samples, and the rest areas are kept in an unlabeled state; S2, constructing context enhancement features, namely firstly carrying out smoothing pretreatment on the input image marked in the step S1, secondly constructing an adjacency relation among super pixels based on feature similarity among pixels to form a super pixel image, finally extracting own features of each super pixel node, calculating context difference features between each super pixel node and a neighborhood super pixel, and forming node context enhancement feature representation for subsequent graph convolution learning; S3, constructing a label enhancement network, namely receiving the constructed characteristics of S2, constructing a graph convolution network by taking super pixels as graph nodes and taking the context enhancement characteristics of the super pixels as graph edges, and spreading sparse label information by utilizing the graph convolution network to enable labels to spread from marked super pixels to unmarked super pixels so as to generate preliminary pixel-level pseudo labels; and S4, establishing a local consistency guided weak supervision training strategy, namely training the label enhancement network cons