CN-121982065-A - Multi-target tracking method with self-adaptive threshold and buffer association

CN121982065ACN 121982065 ACN121982065 ACN 121982065ACN-121982065-A

Abstract

The invention provides a self-adaptive threshold value and buffer associated multi-target tracking method, which relates to the technical field of computer vision and comprises the steps of calculating a self-adaptive confidence threshold value of each frame of image, dividing a detection frame of each frame of image into a high-resolution detection frame and a low-resolution detection frame, generating an initial active track set, carrying out two-stage matching on the detection frame of each frame of image and tracks in the initial active track set, generating an active track set of each frame of image based on a matching result, and generating a target tracking result of each frame of image based on the active track set of each frame of image. The invention introduces a self-adaptive confidence threshold mechanism and a buffer cross ratio matching strategy, so that the threshold can be changed in real time according to a scene, and the limitation caused by the fixed threshold is avoided. And introducing buffer cross-correlation ratio, and properly expanding the boundary of the low-resolution detection frame, wherein the expanded buffer frame still can be overlapped with the track prediction frame enough, so that the buffer frame is successfully matched.

Inventors

CHEN DONGMING
WANG DI
Li Maihao
WANG DONGQI
Chang Tianyu

Assignees

东北大学

Dates

Publication Date: 20260505
Application Date: 20260128

Claims (10)

1. A multi-target tracking method with adaptive threshold and buffer association, comprising the steps of: Acquiring each frame of image in a video sequence, and carrying out target detection on each frame of image by adopting a preset target detector to obtain a detection frame set of each frame of image, wherein the detection frame set comprises a plurality of detection frames, and each detection frame corresponds to an original confidence score; dividing the detection frame of each frame of image into a high-resolution detection frame and a low-resolution detection frame according to the self-adaptive confidence threshold; Generating an initial active track set of each frame of image based on the high-resolution detection frame, the low-resolution detection frame and the previous frame of image; Performing two-stage matching on the detection frame of each frame of image and the track in the initial active track set to obtain a matching result; based on a track ID distribution rule, distributing track IDs for tracks in the active track set, and generating an active track set of each frame of image; and generating a target tracking result of each frame of image based on the active track set of each frame of image.
2. The adaptive threshold and buffer-associated multi-objective tracking method according to claim 1, wherein the specific method for calculating the adaptive confidence threshold for each frame of image is as follows: for any frame image in the video sequence, arranging the original confidence scores of all detection frames of the frame image in a descending order to obtain a sequencing sequence; calculating the first-order difference of two adjacent original confidence scores in the sequencing sequence, and splicing the first-order differences of all the original confidence scores to obtain a differential sequence; And acquiring the index sequence number of the maximum first-order difference in the difference sequence, and taking the confidence coefficient score corresponding to the index sequence number as the self-adaptive confidence coefficient threshold value of the frame image.
3. The adaptive threshold and buffer associated multi-objective tracking method according to claim 2, wherein the detection frames of each frame of image are divided into a high-resolution detection frame and a low-resolution detection frame according to the adaptive confidence threshold, and the specific method is as follows: dividing the detection frame of any frame image in the video sequence into a high-resolution detection frame set according to the self-adaptive confidence threshold value of the frame image And low-resolution detection frame set Wherein the high-resolution detection frame satisfies , For high-resolution detection frame set Middle (f) The original confidence scores of the detection frames are low, and the low-score detection frames meet the following requirements <, For low-resolution detection frame set Middle (f) The raw confidence scores of the individual test frames, Is an adaptive confidence threshold for the frame image.
4. The adaptive thresholding and buffer-associated multi-target tracking method of claim 1, wherein the specific method of generating the initial active trajectory set for each frame of image is: Initializing a high-resolution detection frame of the 1 st frame image into matched tracks for the 1 st frame image, and generating an initial active track set; For the first Frame image based on the first The active track set of the frame image constructs an initial active track set, the first The initial active track set of the frame image comprises a first Matched tracks in frame image based on the first The track generated by the unmatched high-resolution detection frame in the frame image and the temporary track with the loss count not exceeding the maximum allowable loss frame number; Calculating the prediction state of the track in the initial active track set of each frame of image in each frame of image by using a Kalman filter, wherein the specific method comprises the following steps: describing the target motion by adopting a linear constant-speed motion model, and enabling a first step State vector for all tracks of frame image As the first The state vector of all tracks of the frame image is calculated through a state transition matrix Predictive state vector for all trajectories of frame images ; Will be the first Covariance matrix of all tracks of frame image As the first Covariance matrix of all tracks of frame image is calculated Prediction covariance matrix of all tracks of frame image 。
5. The adaptive thresholding and buffer-associated multi-target tracking method of claim 1, wherein the two-stage matching comprises: first stage will be Frame image high-resolution detection frame set The detection box in (1) and the track in the initial active track set are in the first Matching the prediction states of the frames to obtain a first effective matching set, a non-matching high-score detection frame set and a first non-matching track set; Second stage will be Frame image low-resolution detection frame set The detection frames in the (a) and the tracks in the unmatched track set are matched in the prediction state of the kth frame to obtain a second effective matching set, an unmatched low-resolution detection frame set and a second unmatched track set.
6. The adaptive threshold and buffer associated multi-objective tracking method of claim 5, wherein the specific method of the first stage is: Based on the first Initial active track set and high-resolution detection frame set of frame image Constructing a first price matrix Will be the first in the initial active track set Prediction detection frame for strip track And high-resolution detection frame set Middle (f) High-resolution detection frame As a matching pair, then the first price matrix Middle (f) Line 1 The elements of the columns are shown in the following formula: ; ; Wherein, the Is the first active track set predicted by Kalman filtering The strip track is at the first A prediction detection box in the frame image, For high-resolution detection frame set Middle (f) A high-resolution detection frame is arranged on the upper surface of the detection frame, Representing the area of the detection frame; Setting the cross-over ratio threshold The reservation IoU is not lower than the cross-over threshold And the matching pair with the sum of all elements of the first price matrix smaller than the preset threshold value moves the track where the rejected prediction detection frame is positioned into a first unmatched track set ; High-score detection frame set by adopting Hungary algorithm Matching the detection frame in the first set with the predicted state of the track in the initial active track set to obtain a first effective matching set Unmatched high-resolution detection frame set And a first set of unmatched tracks ; Updating a first valid matching set based on Kalman filtering The track in (1) The prediction state vector and the prediction covariance matrix of the frame image are obtained to obtain an updated prediction state vector And an updated prediction covariance matrix 。
7. The adaptive threshold and buffer associated multi-objective tracking method of claim 6, wherein the specific method of the second stage is: for the first unmatched track set Using a low score detection frame set Matching and setting a proportion coefficient For low-resolution detection frame set Middle and low score detection frame Expanding to obtain a buffer frame For calculating buffer cross ratio, establishing distributed buffer strategy, and setting multiple proportional coefficients And Wherein Based on the proportionality coefficient Expanding and primarily matching the low-score detection frames, and for the low-score detection frames which are not matched after the primary matching, based on the proportionality coefficient Performing expansion and re-matching; Based on the first unmatched track set And a low-resolution detection frame set Constructing a second cost matrix Collecting the first unmatched tracks Middle (f) Prediction detection frame for strip track And a low-resolution detection frame set Middle (f) Low-resolution detection frame As a matching pair, then a second cost matrix Middle (f) Line 1 The elements of the columns are shown in the following formula: ; Wherein, the Is the buffer cross ratio; low-resolution detection frame set by using Hungary algorithm Low score detection box and first unmatched track set in (1) Matching the predicted states of the tracks of the first set to obtain a second effective matching set Unmatched low-resolution detection frame set And a second set of unmatched tracks ; Setting threshold of buffer cross ratio BIoU Only the buffer cross ratio is reserved to be not lower than the threshold value Is a matched pair of (a) and (b); Updating a second set of valid matches using Kalman filtering The track in (1) Predictive state vector in frame image And a prediction covariance matrix And second valid matching set The loss count of the middle track is reset to 0.
8. The adaptive thresholding and buffer-associated multi-target tracking method of claim 7, wherein the specific method for generating the active track set for each frame of image based on the matching result is: Assembling unmatched high-resolution detection frames Each unmatched high-score detection frame is independently initialized to be a new track, and an active set is added; Collecting the second unmatched tracks The loss count of the unmatched track in the track number is increased by 1, and the maximum allowable loss count is set If the second unmatched track set The loss count of non-matching tracks in the set exceeds the maximum allowable loss count Deleting the unmatched track, otherwise taking the unmatched track as a temporary track and adding the temporary track into an active track set; aggregating a first valid match And a second valid matching set Adding the track in the database into an active track set; First, the Active track set of frame image Includes a first valid matching set A second valid matching set of tracks in (1) A second unmatched track set And a new track generated based on the unmatched high-score detection frame.
9. The adaptive threshold and buffer associated multi-target tracking method of claim 8, wherein the track ID allocation rule comprises: For the first Trace of successful frame image matching is used along the trace of successful frame image matching at the first position Original track ID in the frame image to ensure the consistency of track identity; a new unique track ID will be assigned to the new track.
10. The adaptive thresholding and buffer-associated multi-target tracking method of claim 9, wherein the target tracking result includes a first Track ID of all tracks in frame image and track ID of each track in the first frame image And boundary frame information in the frame image, wherein the boundary frame information is the position and the size of the target in the current frame image.

Description

Multi-target tracking method with self-adaptive threshold and buffer association Technical Field The invention belongs to the field of computer vision, and particularly relates to a multi-target tracking method with self-adaptive threshold and buffer association. Background Multi-Object Tracking (MOT) is a key technology in the field of computer vision, whose goal is to detect and correlate multiple objects in a video sequence to maintain identity of the objects consistent and to generate their motion trajectories. Currently, the "Tracking-by-Detection" (TBD) based paradigm is the mainstream framework of multi-target Tracking, which first locates targets in each frame with a target detector, and then correlates targets of different frames by a data correlation algorithm. In the TBD paradigm, byteTrack algorithm is a representative operation. In addition to ByteTrack algorithms, other multi-object tracking methods exist within the framework of MOT, such as DeepSORT, deepSORT not only relies on the position and motion information of the object detection frame, but also extracts the appearance features of the object through the ReID network, while DeepSORT performs well in terms of object identity preservation, extraction and matching of ReID features introduces additional computational burden, extraction of appearance features typically requires complex deep learning models that are relatively high in computational resource consumption, especially when processing high resolution images or high frame rate video, and in addition, the feature matching process also requires additional time and computational resources. In contrast, byteTrack and its improvements aim to improve performance without significantly increasing computational costs. However, byteTrack algorithm relies on a fixed threshold set in advance to distinguish between high confidence boxes and low confidence boxes, and this cut-through threshold strategy is particularly inadequate in dynamically changing scenarios. In the second association phase of the ByteTrack algorithm, low confidence boxes are used to match those traces that did not match the high confidence boxes in the first association. However, these low confidence detection boxes tend to correspond to objects that are occluded, fast moving, or of poor quality themselves, resulting in the shape, size, and aspect ratio of the bounding box of the detector output potentially being significantly distorted with large deviations from the true contours of the object. The above drawbacks are interrelated. An unsuitable fixed threshold may result in misclassification of low quality inspection boxes, and even if the boxes are properly classified and enter a second stage match, the limitations of the standard IOU may prevent them from being associated with the correct trajectory. These drawbacks together limit the existing MOT systems to a real world uncontrolled environment. Disclosure of Invention In order to overcome the defects in the prior art, in a first aspect, the invention provides a multi-target tracking method with self-adaptive threshold and buffer association, which comprises the following steps: Acquiring each frame of image in a video sequence, and carrying out target detection on each frame of image by adopting a preset target detector to obtain a detection frame set of each frame of image, wherein the detection frame set comprises a plurality of detection frames, and each detection frame corresponds to an original confidence score; dividing the detection frame of each frame of image into a high-resolution detection frame and a low-resolution detection frame according to the self-adaptive confidence threshold; Generating an initial active track set of each frame of image based on the high-resolution detection frame, the low-resolution detection frame and the previous frame of image; Performing two-stage matching on the detection frame of each frame of image and the track in the initial active track set to obtain a matching result; based on a track ID distribution rule, distributing track IDs for tracks in the active track set, and generating an active track set of each frame of image; and generating a target tracking result of each frame of image based on the active track set of each frame of image. Further, the specific method for calculating the self-adaptive confidence threshold value of each frame of image comprises the following steps: for any frame image in the video sequence, arranging the original confidence scores of all detection frames of the frame image in a descending order to obtain a sequencing sequence; calculating the first-order difference of two adjacent original confidence scores in the sequencing sequence, and splicing the first-order differences of all the original confidence scores to obtain a differential sequence; And acquiring the index sequence number of the maximum first-order difference in the difference sequence, and taking the confidence coefficient sco