CN-115937541-B - Target detection method, device, equipment and storage medium based on dynamic threshold

CN115937541BCN 115937541 BCN115937541 BCN 115937541BCN-115937541-B

Abstract

The application provides a target detection method, device and equipment based on a dynamic threshold value and a storage medium. The method comprises the steps of inputting an image to be detected into a target detection network, processing the image to be detected by utilizing a backbone network of the target detection network to obtain a first feature image, inputting a preset dynamic threshold prediction branch into the first feature image, utilizing the dynamic threshold prediction branch to fuse information of a classification branch and a regression branch, predicting a dynamic threshold corresponding to a prediction frame of the image to be detected, sorting the prediction frames based on the dynamic threshold, traversing according to a sorting result, removing the follow-up prediction frame when the intersection ratio of the follow-up prediction frame and the current prediction frame is higher than the dynamic threshold of the follow-up prediction frame, and taking the residual prediction frame after traversing as a target detection result. The application can dynamically adjust the threshold value which is not greatly inhibited, avoid the occurrence of missed detection of the target and improve the accuracy of target detection.

Inventors

Request for anonymity

Assignees

深圳须弥云图空间科技有限公司

Dates

Publication Date: 20260508
Application Date: 20221223

Claims (9)

1. A method for detecting a target based on a dynamic threshold, comprising: inputting an image to be detected into a target detection network, and processing the image to be detected by using a backbone network of the target detection network to obtain a first feature map; inputting the first feature map into a preset dynamic threshold prediction branch, and predicting a dynamic threshold corresponding to a prediction frame of the image to be detected by utilizing information of the dynamic threshold prediction branch fusion classification branch and the regression branch; sorting the predicted frames based on the classification probability scores corresponding to the predicted frames, traversing according to the sorting results, removing the subsequent predicted frames when the intersection ratio of the subsequent predicted frames and the current predicted frames is higher than the dynamic threshold value of the subsequent predicted frames, and taking the residual predicted frames after traversing as target detection results; after the prediction is performed on the dynamic threshold corresponding to the prediction frame of the image to be detected, the method further includes: determining a first dynamic threshold pseudo tag corresponding to each anchor frame according to preset anchor frames, and taking the maximum overlapping value of a target frame corresponding to each anchor frame as a second dynamic threshold pseudo tag of each anchor frame; And predicting the dynamic threshold of the anchor frame by using the dynamic threshold prediction branch, establishing a loss function based on the first dynamic threshold pseudo tag, the second dynamic threshold pseudo tag and the dynamic threshold of the anchor frame, and training the target detection network by using the loss function.
2. The method according to claim 1, wherein predicting the dynamic threshold corresponding to the prediction frame of the image to be detected by using the information of the dynamic threshold prediction branch fusion classification branch and regression branch comprises: fusing the first characteristic diagram with the characteristic diagram output by the classification branch, obtaining a second characteristic diagram after convolution processing, and directly carrying out convolution processing on the first characteristic diagram to obtain a third characteristic diagram; stacking the second feature map and the third feature map, and obtaining a fourth feature map after convolution processing; Fusing the fourth characteristic diagram with the characteristic diagram output by the regression branch, obtaining a fifth characteristic diagram after convolution processing, and directly carrying out convolution processing on the fourth characteristic diagram to obtain a sixth characteristic diagram; stacking the fifth characteristic diagram and the sixth characteristic diagram, and obtaining a seventh characteristic diagram after convolution processing; and carrying out convolution processing on the seventh feature map to obtain a dynamic threshold value corresponding to each prediction frame in the image to be detected.
3. The method according to claim 1, wherein the determining, according to the preset anchor frames, the first dynamic threshold pseudo tag corresponding to each of the anchor frames includes: Determining a generation pool corresponding to each target frame according to the preset anchor frame and the mapping relation between the target frame and the anchor frame; Inputting a training image into the target detection network, and outputting a prediction frame obtained by carrying out regression on the anchor frame and other prediction frames corresponding to the anchor frame in the generation pool; And calculating the intersection ratio between the prediction frame and the other prediction frames, and taking the intersection ratio with the largest numerical value as a first dynamic threshold pseudo tag corresponding to the anchor frame.
4. The method according to claim 1, wherein the method further comprises: And determining all target frames corresponding to the training image, respectively calculating the intersection ratio of each target frame and other target frames, taking the other target frames corresponding to the maximum intersection ratio as the maximum overlapping object of the target frames, and taking the value of the maximum intersection ratio as the maximum overlapping value.
5. The method of claim 3, wherein the determining a generation pool corresponding to each of the target frames comprises: and taking the anchor frames corresponding to the target frames as a generation pool of the target frames according to the mapping relation between the target frames and the anchor frames, and taking the number of the anchor frames as the size of the generation pool.
6. A method according to claim 3, characterized in that the loss function is expressed as: Wherein, the The constant is represented by a value that is a function of, Representing the dynamic threshold of the anchor box, Representing a first dynamic threshold pseudo tag, Representing a second dynamic threshold pseudo tag.
7. An object detection device based on a dynamic threshold, comprising: the input module is configured to input an image to be detected into a target detection network, and the backbone network of the target detection network is utilized to process the image to be detected to obtain a first feature map; The prediction module is configured to input the first feature map into a preset dynamic threshold prediction branch, and predict a dynamic threshold corresponding to a prediction frame of the image to be detected by utilizing information of the dynamic threshold prediction branch fusion classification branch and the regression branch; The traversing module is configured to sort the predicted frames based on the classification probability scores corresponding to the predicted frames, traverse the predicted frames according to the sorting results, remove the subsequent predicted frames when the intersection ratio of the subsequent predicted frames and the current predicted frames is higher than the dynamic threshold value of the subsequent predicted frames, and take the residual predicted frames after traversing as target detection results; The system comprises a target detection network, a prediction module, a training module, a loss function and a target detection network, wherein the prediction module is used for predicting dynamic thresholds corresponding to prediction frames of an image to be detected, determining a first dynamic threshold pseudo tag corresponding to each anchor frame according to preset anchor frames, taking a maximum overlapping value of a target frame corresponding to each anchor frame as a second dynamic threshold pseudo tag of each anchor frame, predicting the dynamic thresholds of the anchor frames by utilizing the dynamic threshold prediction branches, and building the loss function based on the first dynamic threshold pseudo tag, the second dynamic threshold pseudo tag and the dynamic threshold of each anchor frame, and training the target detection network by utilizing the loss function.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 6 when the program is executed.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 6.

Description

Target detection method, device, equipment and storage medium based on dynamic threshold Technical Field The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a target based on a dynamic threshold. Background Non-maximum suppression (NMS) is a common post-processing technique for mainstream multi-objective detection algorithms. In general, a deep learning-based object detection network densely outputs many detection frames and their scores on an image. The non-maximal suppression technology sorts the detection frames based on scores, eliminates overlapped detection frames on the same target according to the cross-over ratio between the detection frames, and enables one target to keep a detection frame with the highest score so as to reduce the predicted overlap. In the existing target detection algorithm, the threshold value of the NMS is fixed no matter the NMS or the improved soft-NMS is used, which can cause the problem of missed detection of the target which is actually overlapped in the image, namely, the NMS technology in the existing target detection algorithm easily causes missed detection of the target, and the problem of inaccurate detection for dense scenes and occlusion scenes exists. Disclosure of Invention In view of the above, the embodiments of the present application provide a method, an apparatus, a device, and a storage medium for detecting a target based on a dynamic threshold, so as to solve the problems in the prior art that the target is missed, and the detection is inaccurate for dense scenes and occlusion scenes. The first aspect of the embodiment of the application provides a target detection method based on a dynamic threshold, which comprises the steps of inputting an image to be detected into a target detection network, processing the image to be detected by utilizing a main network of the target detection network to obtain a first feature map, inputting a preset dynamic threshold prediction branch into the first feature map, utilizing the dynamic threshold prediction branch to fuse information of a classification branch and a regression branch, predicting a dynamic threshold corresponding to a prediction frame of the image to be detected, sorting the prediction frames based on the dynamic threshold, traversing according to a sorting result, removing the follow-up prediction frame when the intersection ratio of the follow-up prediction frame and a current prediction frame is higher than the dynamic threshold of the follow-up prediction frame, and taking the prediction frame remained after traversing as the target detection result. The second aspect of the embodiment of the application provides a target detection device based on a dynamic threshold, which comprises an input module, a prediction module and a traversing module, wherein the input module is configured to input an image to be detected into a target detection network, process the image to be detected by utilizing a backbone network of the target detection network to obtain a first feature map, the prediction module is configured to input the first feature map into a preset dynamic threshold prediction branch, merge information of a classification branch and a regression branch by utilizing the dynamic threshold prediction branch, predict a dynamic threshold corresponding to a prediction frame of the image to be detected, and the traversing module is configured to sort the prediction frames based on the dynamic threshold and traverse the sorting result, and remove the follow-up prediction frames when the intersection ratio of the follow-up prediction frames and the current prediction frames is higher than the dynamic threshold of the follow-up prediction frames, and take the prediction frames remained after traversing as the target detection result. In a third aspect of the embodiments of the present application, there is provided an electronic device including a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program. In a fourth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method. The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects: The method comprises the steps of inputting an image to be detected into a target detection network, processing the image to be detected by utilizing a backbone network of the target detection network to obtain a first feature image, inputting a preset dynamic threshold prediction branch into the first feature image, utilizing the dynamic threshold prediction branch to fuse information of a classification branch and a regressio