CN-121982306-A - Self-adaptive hybrid prototype learning method for real-time semantic segmentation

CN121982306ACN 121982306 ACN121982306 ACN 121982306ACN-121982306-A

Abstract

The invention discloses a self-adaptive hybrid prototype learning method for real-time semantic segmentation, and belongs to the technical field of prototype learning. The method aims at solving the problem that static prototypes are difficult to adapt to scene dynamic changes. The method comprises the steps of screening high-confidence pixel characteristics for each category based on an initial segmentation prediction graph, weighting to generate a dynamic prototype of each category aiming at a current input image, carrying out self-adaptive weighted fusion on the dynamic prototype and a learnable global category prototype through learnable parameters to obtain a category guidance prototype, calculating a cosine similarity graph of the feature graph and the category guidance prototype, and finally carrying out residual connection on the similarity graph and the initial segmentation prediction graph through introducing learnable scale parameters to output a final segmentation result. The method can be combined with various real-time semantic segmentation algorithms, and improves the adaptability and segmentation precision of the model to scene change on the premise of only increasing a small amount of calculated amount and parameter.

Inventors

LIU GANG
Lv Shuxian
FAN XIAOLIANG
Tong Zhaoya
JI XIANG
GAO ZHANYANG
CAO SHUNING
YAN JUNGANG

Assignees

河南科技大学

Dates

Publication Date: 20260505
Application Date: 20260119

Claims (10)

1. An adaptive hybrid prototype learning method for real-time semantic segmentation, comprising the steps of: S1, extracting features of an input image to obtain a feature map, and generating an initial segmentation prediction map from the feature map through a semantic segmentation head; s2, carrying out feature refining treatment on the feature map through a feature refining module to obtain a refined feature map, wherein the feature refining module is formed by sequentially connecting a convolution layer, a batch normalization layer and an activation function layer in series; s3, based on the initial segmentation prediction graph, screening the front with highest prediction confidence degree for each semantic category The positions of the pixels and the confidence values after normalization; s4, based on the obtained front part Extracting feature vectors of each pixel position and corresponding confidence value from the refined feature map, and for each semantic category, preceding the feature vectors Confidence values corresponding to the pixels are weights, and the confidence values are used for the category The feature vectors are weighted and summed to generate dynamic prototypes of all the categories, and the dynamic prototypes represent feature distribution centers of the corresponding categories in the current image; S5, introducing a group of learnable global category prototypes; s6, through a learnable fusion coefficient Performing self-adaptive weighted fusion on the dynamic prototype and the learnable global class prototype to generate a final class guide prototype; S7, calculating cosine similarity between the feature of each pixel position in the feature map in the step S1 and each prototype in the class guide prototypes, and generating a similarity map by using the similarity to measure the distance between the classes; s8, multiplying the similarity graph by a leachable scale parameter And then, carrying out residual connection with the initial segmentation prediction graph to obtain a final segmentation result.
2. The adaptive hybrid prototype learning method for real-time semantic segmentation according to claim 1, wherein in step S4, for class The dynamic prototype is calculated by the following formula: In the formula, Representing categories Is a dynamic prototype of (a), Representing categories Front of (2) A set of indices of the individual pixels, Representing an index The pixels at a location belong to a class Is used to determine the confidence value of the (c) in the (c), Representing an index Where the pixels are in a refined feature map Corresponding feature vector of (a), the dimension is ; ; And combining the dynamic prototypes corresponding to all the categories according to the category sequence to form a dynamic prototype set.
3. An adaptive hybrid prototype learning method for real-time semantic segmentation according to claim 1, in which in step S5 the learnable global class prototype is a parametric matrix that is randomly initialized and iteratively optimized by a back-propagation algorithm during model training, each row vector in the matrix representing a generic representation of a class.
4. The adaptive hybrid prototype learning method for real-time semantic segmentation according to claim 2, wherein in step S6, the adaptive weighted fusion is implemented by the following formula: In the formula, A category instruction prototype is represented and, Representing the fusion coefficients that can be learned, , Representing a global category prototype that can be learned, Representing a dynamic prototype.
5. The adaptive hybrid prototype learning method for real-time semantic segmentation according to claim 1, wherein in step S7, the cosine similarity calculation formula is: In the formula, The vector inner product is represented by the vector, Representing vectors The norm of the sample is calculated, Is indicated in the position The similarity between the feature vector of the upper pixel and each guiding prototype is larger, which indicates that the pixel is at the position The higher the probability of belonging to the corresponding class.
6. The adaptive hybrid prototype learning method for real-time semantic segmentation according to claim 5, wherein in step S8, the process of fusing the initial segmentation prediction graph and the similarity graph is as follows: In the formula, A segmentation result map representing the final output is presented, An initial segmentation prediction map is represented and, A similarity map is represented by the graph of similarity, Is a learnable scale parameter.
7. An adaptive hybrid prototype learning method for real-time semantic segmentation according to claim 1, in which the learnable fusion coefficients are Learnable scale parameters Iterative optimization is carried out in the model training process through a back propagation algorithm.
8. The adaptive hybrid prototype learning method as claimed in claim 1, wherein in step S2, the convolution layer is a1×1 convolution layer, and the activation function layer is Activating the function layer.
9. The adaptive hybrid prototype learning method for real-time semantic segmentation according to claim 1, wherein in step S3, the pre-stage with highest prediction confidence is selected The pixel positions comprise screening the initial segmentation prediction graph for each category to obtain the highest prediction confidence A pixel, recording the index position of the pixel in the feature map, and the corresponding channel Confidence value after function normalization.
10. The adaptive hybrid prototype learning method for real-time semantic segmentation according to claim 1, wherein the method is integrated with a real-time semantic segmentation algorithm as a generic module, and the integration position is after the semantic segmentation head.

Description

Self-adaptive hybrid prototype learning method for real-time semantic segmentation Technical Field The invention relates to the technical field of prototype learning, in particular to a self-adaptive hybrid prototype learning method for real-time semantic segmentation. Background The real-time semantic segmentation serves as a core task of computer vision, aims at accurately classifying each pixel in an image, meets the real-time processing requirement, and is a key technology for realizing environment perception and scene understanding. The technology plays an irreplaceable role in the fields of intelligent driving, night monitoring, military reconnaissance, industrial detection and the like, especially in application scenes relying on infrared imaging. Currently, methods based on deep convolutional neural networks are the dominant technical route in this field. However, most existing real-time semantic segmentation algorithms rely on discriminant classifiers to independently predict pixels, which makes it difficult to adequately model feature distribution structures and semantic similarity relationships within the same semantic class. Therefore, when a complex scene is faced, the difference between the appearance of the class is large or interference exists, the problems of class confusion, boundary blurring and generalization performance degradation are easy to occur. In recent years, prototype learning ideas have been introduced into the field of real-time semantic segmentation, which converts the traditional pixel classification problem into a distance metric problem of finding nearest neighbor prototypes in feature space by learning a representative feature prototype (Prototype) for each semantic class. The model based on the measurement can better model the semantic similarity in the category, and provides a new technical path for improving the robustness of semantic segmentation. However, existing prototype-based learning approaches rely mostly on static prototypes derived from the entire training set. Static prototypes are essentially global averages of class features that are difficult to capture and adapt to dynamic appearance changes of the same class under different specific scenarios. For example, under varying lighting conditions, viewing angles, distance scales, or partial occlusion, the visual characteristics of the target may change significantly, and the static prototype may be greatly impaired in the representativeness and discrimination of these scenes due to its inherent stiffness, resulting in reduced segmentation performance. In particular, in infrared imaging applications such as night surveillance, military reconnaissance, etc., the above limitations are further amplified: The infrared image has the problems of low contrast ratio between the target and the background and lack of texture information, the edge of the target is blurred and is easy to be confused with background clutter, and the static prototype is difficult to accurately represent the category information under the weak characteristics. And secondly, active interference exists, namely, in military application, a target always has heat source interference, and the interference may have local similarity with a real target part in a characteristic space, so that false activation of a model is extremely easy to cause, and a large amount of noise is introduced. And thirdly, the scale and the gesture of the target are changed drastically, namely, the scale of the same type of target parts (such as aircraft heads and propellers) is changeable under different distances and visual angles, and the shape is different. The static prototype is used as a fixed vector, and cannot adapt to the dynamic change, so that the segmentation effect on the small-scale and non-standard posture components is poor. Fourth, the constraint of real-time requirements is that the application scenario requires that the algorithm must complete processing within a limited computational resource and strict time delay. This limits the adoption of a prototype optimization and matching mechanism that is overly complex or time consuming, requiring a balance between model efficiency and adaptation capability. In summary, the performance of the existing real-time semantic segmentation method, especially the model based on the static prototype, has obvious bottleneck when dealing with the challenges of dynamic changes of the real scene, especially low contrast, active interference, variable scale pose and the like in the infrared image segmentation. Therefore, a new method for realizing self-adaptive optimization of prototype representation by breaking through the limitation of static prototype under the strict real-time constraint is urgently needed in the field, so that the quality and discrimination capability of prototype representation are improved. Disclosure of Invention The invention aims to provide a self-adaptive hybrid prototype learning method for r