CN-116958583-B - Significance target detection method based on semantic information guidance
Abstract
The invention relates to the technical field of image processing, in particular to a salient object detection method based on semantic information guidance. The method comprises the steps of constructing a saliency target detection model based on semantic information guidance, dividing pictures in a saliency image dataset into a training set, a verification set and a test set of the saliency target detection model, training and constructing the saliency target detection model, inputting the test set into the trained saliency target detection model to obtain four evaluation indexes, using the corresponding saliency target detection model for detecting the saliency target of the image when the four evaluation indexes meet actual application requirements, otherwise, adjusting the learning rate, and retraining the saliency target detection model until the four evaluation indexes of the saliency target detection model meet actual application requirements. The constructed salient object detection model can completely divide the salient objects, maintain accurate details and improve the overall feature extraction capability.
Inventors
- LI YONGJUN
- LUO JINCHENG
- LI BO
- LI CHAOYUE
- ZHANG XINRU
- Chen Jinzhimin
- LIANG YONG
- CHEN JING
Assignees
- 河南大学
Dates
- Publication Date
- 20260508
- Application Date
- 20230727
Claims (5)
- 1. The significance target detection method based on semantic information guidance is characterized by comprising the following steps of: constructing a significance target detection model based on semantic information guidance; establishing a data set of RGB saliency images; cutting all the pictures in the dataset into pictures with the same size, and taking all the cut pictures as a saliency image dataset; Training and constructing a saliency target detection model; inputting the test set into a trained saliency target detection model, and testing the performance of the saliency target detection model to obtain four evaluation indexes; Otherwise, the learning rate of the saliency target detection model during training is adjusted, and the saliency target detection model is retrained until the four evaluation indexes of the saliency target detection model meet the actual application requirements, and the corresponding saliency target detection model is used for detecting the saliency target of the image; the method for constructing the significance target detection model based on semantic information guidance comprises the following steps: Inputting RGB images into PVTv backbone network and extracting image characteristics of four stages to generate characteristic representations of the salient images of the first layer, the second layer, the third layer and the highest layer; step 1.2, introducing an attention module at the highest layer, wherein the attention module comprises a channel attention module and a space attention module, extracting semantic information rich in the features of the highest layer, and generating a feature map of the highest layer; Step 1.3, constructing a high-level feature guiding module, sending the generated feature representations of the first-level, second-level and third-level significant images and the semantic information extracted in the step 1.2 to the high-level feature guiding module as input, enhancing the low-level feature representation, and generating three-level significant feature graphs; Step 1.4, constructing a self-adaptive feature fusion module, wherein the input of the self-adaptive feature fusion module comprises three levels of significant feature graphs generated by a high-level feature guiding module and high-level features from the highest-level output, and generating rough significant features by adding a learnable weight coefficient in the fusion process; Step 1.5, constructing a top-down correlation aggregation module, guiding rough significant features generated by the self-adaptive feature fusion module through the feature map generated in the step 1.1, and improving the accuracy of significant feature positions and local details; Wherein, step 1.3 further comprises: step 1.3.1 highest layer features Features of the layers respectively Input into a module, wherein Indicating the number of layers and ; Step 1.3.2 for highest layer features Using global adaptive pooling, global information of the entire feature map is captured as shown in equation (1): (1) Wherein, the Representing highest level features Channel statistics of (2); In (a) Representing the number of channels; Respectively the highest level features A spatial abscissa at each layer; And A height and width representing the highest level feature map; And In (a) and (b) Representing the highest number of layers; step 1.3.3, multiplying the obtained semantic information with the low-level characteristic representation element by element, summing the obtained semantic information in the channel dimension to obtain a correlation matrix, generating a probability coefficient in space through a Sigmoid function, multiplying the probability coefficient with the low-level characteristic representation to obtain a guided low-level characteristic, and obtaining the guided low-level characteristic as shown in a formula (2): (2) Wherein the method comprises the steps of Is a low-level feature after the boot, Is that The layer of the convolution is formed from a layer of, Is a Sigmoid operation, and the method comprises the steps of, Representing element-by-element multiplication; is an nth layer feature.
- 2. The semantic information guidance-based saliency target detection method according to claim 1, wherein the constructing an adaptive feature fusion module, the input of which includes a high-level feature from a highest-level output, three-level saliency feature maps generated by the high-level feature guidance module, generates coarse saliency features by adding a learnable weight coefficient in a fusion process, includes: Step 1.4.1, inputting three-level salient feature graphs generated by a high-level feature and high-level feature guiding module output from the highest level into an adaptive feature fusion module; step 1.4.2, multiplying the input features in the adaptive feature fusion module by the adaptive weight coefficients respectively in the fusion process And Wherein Obtaining the fused salient features as shown in a formula (3): (3) Wherein the method comprises the steps of For the coarse salient features after the fusion, Representing high-level features from the previous layer output in the fusion process, An up-sampling operation is indicated and, Representing addition by element; Is a feature of the lower layer obtained by this process.
- 3. The semantic information guidance-based saliency target detection method according to claim 1, wherein the constructing a top-down correlation aggregation module, guiding rough salient features generated by an adaptive feature fusion module through feature graphs generated in step 1.1, improves accuracy of salient feature positions and local details, comprises: step 1.5.1 coarse salient features to be input Respectively transferred to three different convolution layers to obtain feature mapping , , Wherein ; Step 1.5.2, carrying out Sigmoid operation on the feature map generated in the step 1.1, and then carrying out Sigmoid operation on the feature map , , Multiplication to construct , , A matrix in which ; Step 1.5.3-will be , , Matrix reshaping into Wherein Is the number of pixels, and then Is transposed of (a) Multiplying and obtaining attention by a softmax layer As shown in formula (4): (4) Wherein, the Represent the first Position and the first Exp is an exponential function based on a natural constant e; An element value at an i-th position of a transposed matrix of the Q matrix; Is the element value at the j-th position of the K matrix; step 1.5.4 mapping the obtained features And (3) with Matrix multiplication and remodelling Multiplying by a scale parameter And is connected with characteristics Performing element-by-element summation operation to obtain final output 。
- 4. The semantic information guidance-based salient object detection method of claim 1, wherein the training builds a salient object detection model comprising: setting training parameters, wherein the size of a training Batch is set to be Batch size=8, the learning rate is initially set to be Ir=0.0001, and the training iteration times are set to be epoch=200; Step 3.2, sending the established saliency image data set into a saliency target detection model and training according to the parameters set in the step 3.1; And 3.3, carrying out random gradient descent on the significance target detection model trained in the step 3.2 by utilizing an Adam random optimization algorithm to continuously optimize a loss function, and determining an optimal weight according to the loss change trend of cross verification of the training set and the verification set until the loss change gradually approaches a stable state.
- 5. The semantic information guidance-based saliency target detection method according to claim 1, wherein the inputting the test set into the trained saliency target detection model, testing the performance of the saliency target detection model, and obtaining four evaluation indexes comprises: Inputting the test set into a trained saliency target detection model, and testing the performance of the saliency target detection model to obtain four evaluation indexes of MAE, F-measure, E-measure and S-measure.
Description
Significance target detection method based on semantic information guidance Technical Field The invention relates to the technical field of image processing, in particular to a salient object detection method based on semantic information guidance. Background Saliency target detection is a popular problem in the field of computer vision, which aims to simulate the behaviour of the human visual system by automatically identifying and locating the most salient target areas in an image. The salient object detection has wide application value in many practical applications, such as semantic segmentation, object tracking, image retrieval and the like. Traditional saliency target detection methods are mainly based on manually designed features or heuristic rules, usually require training samples of manually marked saliency targets, and consume a great deal of manpower and time. And can become very difficult to handle on large-scale data sets, and difficult to accommodate in complex and diverse scenarios and tasks. With the development of deep learning technology, more and more researchers begin to adopt a deep learning-based method. These methods can automatically learn features from large amounts of data and are generally more accurate and efficient than conventional methods. The Chinese patent CN111209918B discloses an image saliency detection method based on multi-graph model priori and short connection network optimization, which is used for saliency target detection, and the method firstly utilizes color and position information to calculate a corresponding KNN graph model and a K regular graph model for each input RGB image, and then fusing the images at the pixel level, and finally optimizing the initial saliency map by using a short-connection network to obtain a final saliency map of the original image. He Wei, pan Chen in the paper "attention directing network saliency target detection", the chinese image graphic journal 2022, volume 27, 4, pages 1176 to 1190, discloses a new saliency detection model AGNet, which is capable of selectively and gradually aggregating deep and shallow feature information by combining a channel attention mechanism and a spatial attention mechanism, better processing the transmission and aggregation of features of different layers, and avoiding the influence of fusion redundant background information on saliency mapping. Disclosure of Invention In order to solve the technical problem that the whole feature extraction capability of the current saliency target detection model is limited, the invention aims to provide a saliency target detection method based on semantic information guidance, and the adopted technical scheme is as follows: constructing a significance target detection model based on semantic information guidance; establishing a data set of RGB saliency images; cutting all the pictures in the dataset into pictures with the same size, and taking all the cut pictures as a saliency image dataset; Training and constructing a saliency target detection model; inputting the test set into a trained saliency target detection model, and testing the performance of the saliency target detection model to obtain four evaluation indexes; And otherwise, adjusting the learning rate of the saliency target detection model during training, retraining the saliency target detection model until the four evaluation indexes of the saliency target detection model meet the actual application requirements, and using the corresponding saliency target detection model for the saliency target detection of the image. Further, the constructing a saliency target detection model guided based on semantic information includes: Inputting RGB images into PVTv backbone network and extracting image characteristics of four stages to generate characteristic representations of the salient images of the first layer, the second layer, the third layer and the highest layer; step 1.2, introducing an attention module at the highest layer, wherein the attention module comprises a channel attention module and a space attention module, extracting semantic information rich in the features of the highest layer, and generating a feature map of the highest layer; Step 1.3, constructing a high-level feature guiding module, sending the generated feature representations of the first-level, second-level and third-level significant images and the semantic information extracted in the step 1.2 to the high-level feature guiding module as input, enhancing the low-level feature representation, and generating three-level significant feature graphs; Step 1.4, constructing a self-adaptive feature fusion module, wherein the input of the self-adaptive feature fusion module comprises three levels of significant feature graphs generated by a high-level feature guiding module and high-level features from the highest-level output, and generating rough significant features by adding a learnable weight coefficient in the fusion process; And