EP-4742175-A1 - IDENTIFYING A POTENTIONAL FALSE POSITIVE DETECTION BOX

EP4742175A1EP 4742175 A1EP4742175 A1EP 4742175A1EP-4742175-A1

Abstract

The present disclosure relates to anchor-based object detection systems and methods that identify potential false positive detections among three detection boxes in a same image frame. Each detection box has a predicted IoU score representing confidence that it captures an object. The system determines S502 overlap of a second box with a first and third box, where the second is positioned between them. It identifies S504 the second box as a potential false positive if it determines S506 the IoU score of the second box is below a set threshold and if corresponding reference points within each box is determined S508 to be substantially aligned, with the second box's reference point close to an alignment line defined by the first and third boxes.

Inventors

HASSELBERG, Emanuel
Ärlebäck, Richard
ERIKSSON, Jonatan

Assignees

Axis AB

Dates

Publication Date: 20260513
Application Date: 20241111

Claims (15)

A method (500) for identifying a potential false positive detection box in a set of three detection boxes (202, 204, 206) within an anchor-based object detection system (402), wherein each of the three detection boxes is detected in a same image frame (200), wherein each of the three detection boxes is associated with a respective predicted intersective over union, IoU, score, wherein the predicted IoU score indicates a confidence of the anchor-based object detection system that the detection box represents an object, the method comprising: determining (S502) that a second detection box (206) overlaps at least partly with both a first detection box (202) and a third detection box (204), wherein the second detection box is located between the first detection box and the third detection box in the image frame; identifying (S504) that the second detection box is a potential false positive detection box by: determining (S506) that the predicted IoU score of the second detection box is lower than a first threshold score; and determining (S508) a first reference point (208) in the first detection box, a corresponding second reference point (210) in the second detection box and a corresponding third reference point (212) in the third detection box, and determining that the first, second, and third reference points are substantially aligned in the image frame, such that the second reference point is within a threshold distance from an alignment defined by the first and third reference points.
The method of claim 1, wherein determining that the first, second, and third reference points are substantially aligned comprises: determining a first vector (214) between a first pair of reference points selected from the first, second and third reference points, and a second vector (216) between a second, different, pair of reference points selected from the first, second and third reference points, wherein the first, second, and third reference points are substantially aligned in the image frame if an absolute value of the cosine of the angle (θ) between the first and second vectors is less than a threshold from 1.
The method of claim 1, wherein determining that the first, second, and third reference points are substantially aligned comprises determining that the second reference point lies within a threshold distance from the line formed between the first and third reference points.
The method of any one of claims 1-3, wherein identifying that the second detection box is a potential false positive detection is further performed by: determining (S510) that the predicted IoU score of the second detection box is least a second threshold score lower than the predicted IoU scores of each of the first detection box and the third detection box.
The method of any one of claims 1-4, wherein each of the three detection boxes is associated with a predicted object class, wherein identifying that the second detection box is a potential false positive detection is further performed by: determining (S512) that the predicted object class associated with each of the first, second and third detection boxes are the same.
The method of any one of claims 1-5, wherein the first, second and third reference points are the midpoint of the top edge of the first, second and third detection boxes, respectively.
The method of any one of claims 1-5, wherein the first, second and third reference points are the centre point of the first, second and third detection boxes, respectively.
The method of any one of claims 1-6, further comprising: assigning a lower probability to the second detection box for association with an object track in an object tracking system (406), compared to probabilities assigned to the first and third detection boxes
The method of claim 8, wherein assigning a lower probability comprises assigning a higher cost for association of the second detection box with the object track, compared to the cost for association of the first detection box or the third detection box with the object track.
The method of any one of claims 8-9, wherein assigning a lower probability comprises assigning the second detection box to a lower-priority partition of detection boxes for association with the object track and assigning the first and third detection boxes to a higher-priority partition of detection boxes for association with the object track, wherein the partitions being processed sequentially to associate tracks in the object tracking system.
The method of any one of claims 1-10, further comprising: filtering out the second detection box from a set of detection boxes in the first image frame that are marked as potential new object tracks in an object tracking system.
The method of any one of claims 1-11, further comprising counting the first and third detection boxes as confirmed objects in an object counting system (408) and counting the second detection box as an uncertain object in the object counting system.
The method of any one of claims 1-7, further comprising: filtering out the second detection box from an initial set of detection boxes that includes the first, second, and third detection boxes, and using the remaining set of detection boxes in a downstream analysis system (404).
A non-transitory computer-readable storage medium having stored thereon instructions for implementing the method according to any one of claims 1-13 when executed on one or more devices having processing capabilities.
An anchor-based object detecting system (402) configured for identifying a potential false positive detection box in a set of three detection boxes (202, 204, 206), wherein each of the three detection boxes is detected in a same image frame (200), wherein each of the three detection boxes is associated with a respective predicted intersective over union, IoU, score, wherein the predicted IoU score indicates a confidence of by the anchor-based object detecting system that the detection box represents an object, the anchor-based object detecting system configured for: determining (S502) that a second detection box (206) overlaps at least partly with both a first detection box (202) and a third detection box (204), wherein the second detection box is located between the first detection box and the third detection box in the image frame; identifying (S504) that the second detection box is a potential false positive detection box by: determining (S506) that the IoU score of the second detection box is lower than a first threshold score; and determining (S508) a first reference point (208) in the first detection box, a corresponding second reference point (210) in the second detection box and a corresponding third reference point (212) in the third detection box, and determining that the first, second, and third reference points are substantially aligned in the image frame, such that the second reference point is within a threshold distance from an alignment defined by the first and third reference points.

Description

Technical Field The present disclosure relates to object detection, and in particular to a method, system and software for identifying a potential false positive detection box within an anchor-based object detection system. Background In modern object detection systems such as Single Shot Detectors (SSD) and YOLO (You Only Look Once), anchor boxes are fundamental for detecting objects across an image. These anchor boxes are predefined and typically cover the image at various scales and aspect ratios to detect objects of different sizes and shapes. During training, the object detection system learns to adjust these anchor boxes to better fit objects by encoding those that have a high Intersection over Union (IoU) score, representing the overlap between the anchor box and the ground truth object. An anchor box with a significant IoU overlap is assigned to that object for training purposes. A significant issue may arise when multiple objects are located close to each other, or when anchor boxes are sparsely distributed across the image. In such situations, more than one object can have a similar IoU with a particular anchor box, leading to ambiguous assignments during training. This ambiguity can cause a phenomenon referred to as in-between detections. When two or more objects share similar IoU scores with the same anchor box, the object detection systems may inconsistently assign the anchor box to different objects during training. This results in an in-between detections which is a false positive detection or ambiguous detection, an erroneous detection box positioned between the real objects. These in-between detections negatively affect the performance of the object detection system by introducing false positives. There is thus a need for improvements in this context. Summary In view of the above, solving or at least reducing one or several of the drawbacks discussed above would be beneficial, as set forth in the attached independent patent claims. According to a first aspect of the present invention, there is provided method for identifying a potential false positive detection box in a set of three detection boxes within an anchor-based object detection system, wherein each of the three detection boxes is detected in a same image frame, wherein each of the three detection boxes is associated with a respective predicted intersective over union, IoU, score, wherein the predicted IoU score indicates a confidence of the anchor-based object detection system that the detection box represents an object, the method comprising: determining that a second detection box overlaps at least partly with both a first detection box and a third detection box, wherein the second detection box is located between the first detection box and the third detection box in the image frame; identifying that the second detection box is a potential false positive detection box by: determining that the predicted IoU score of the second detection box is lower than a first threshold score; and determining a first reference point in the first detection box, a corresponding second reference point in the second detection box and a corresponding third reference point in the third detection box, and determining that the first, second, and third reference points are substantially aligned in the image frame, such that the second reference point is within a threshold distance from an alignment defined by the first and third reference points. This disclosure addresses the problem of in-between detections caused by ambiguous anchor box assignments during training of an anchor-based object detection systems, particularly in situations where the anchor boxes are sparsely distributed. The techniques described herein aims to enhance detection accuracy by identifying potential false positive detection boxes that arise from this ambiguity. Specifically, the method focuses on identifying detection boxes that may fall between real objects. The method introduces strategies for managing ambiguous detection boxes while minimizing computational impact, thereby allowing object detection systems to maintain reliable performance even under hardware constraints or when anchor boxes are distributed sparsely. By identifying these ambiguous detection boxes, object detection systems are better equipped to handle false positives, improving both accuracy and efficiency. The "predicted IoU score" in object detection systems like SSD or YOLO refers to a measure predicted by the object detection system/model that indicates how well a proposed detection (a bounding box) is likely to overlap with an actual object in the image. IoU (Intersection over Union) traditionally refers to the ratio of the overlapping area between the predicted bounding box and the ground truth box divided by the area of their union. However, the predicted IoU score in this context serves as a confidence measure, predicting how likely it is that the bounding box generated by the object detection model