CN-117242489-B - Target tracking method and device, electronic equipment and computer readable medium

CN117242489BCN 117242489 BCN117242489 BCN 117242489BCN-117242489-B

Abstract

The disclosure provides a target tracking method and device, electronic equipment and a computer readable medium, and belongs to the technical field of computers. The method comprises the steps of obtaining a video stream of a preset collecting area, carrying out target detection on a T-th frame video frame of the video stream, determining a first candidate frame of at least one candidate target in the T-th frame video frame, wherein T is an integer and 1<t is less than or equal to T, T is the number of video frames of the video stream, classifying candidate targets according to confidence levels of the candidate targets and first intersection ratios among the first candidate frames of the candidate targets, determining candidate targets of at least one matching level, respectively matching the candidate targets of at least one matching level according to a tracker set of the video stream in a T-1 state and a preset matching strategy, determining a target tracking result of the T-th frame video frame, and determining a tracking track of the target in the video stream according to the target tracking result of the T-th frame video frame.

Inventors

WANG ZHEN
LI FEI
NIE DING

Assignees

京东方科技集团股份有限公司

Dates

Publication Date: 20260505
Application Date: 20220414

Claims (19)

1. A target tracking method, comprising: Acquiring a video stream of a preset acquisition area; Performing target detection on a T-th frame video frame of the video stream, determining a first candidate frame of at least one candidate target in the T-th frame video frame, wherein T is an integer and 1<t is less than or equal to T, and T is the number of video frames of the video stream; Classifying the candidate targets according to the confidence degrees of the candidate targets and the first intersection ratios among the first candidate frames of the candidate targets, and determining at least one candidate target with a matching level; Respectively matching the candidate targets of the at least one matching level according to a tracker set of the video stream in a t-1 state and a preset matching strategy, and determining a target tracking result of the t frame video frame, wherein the target tracking result comprises the identification and the position of the target in the t frame video frame; Determining a tracking track of a target in the video stream according to a target tracking result of a T-frame video frame of the video stream, wherein the tracking track comprises a video frame corresponding to the target and a position in the corresponding video frame; The video stream includes a video stream of a plurality of acquisition regions, and after the determining a tracking trajectory of an object in the video stream, the method further includes: grouping targets according to the geographic positions of the plurality of acquisition areas and the tracking tracks of the targets in the video streams of the plurality of acquisition areas through the space relativity among the tracking tracks to obtain a plurality of space groups, wherein each space group comprises at least one target and the tracking track of the target; Aiming at any space grouping, grouping the targets in the space grouping according to the starting time and the ending time of the tracking track of the targets in the space grouping and the passing duration interval between the acquisition areas where the targets are positioned to obtain a plurality of time sub-groupings; clustering tracking tracks of targets in each time sub-group respectively to obtain track matching results of the space groups, wherein the track matching results comprise tracking tracks belonging to the same target in video streams of different acquisition areas; and determining the track matching results of the video streams of the plurality of acquisition areas according to the track matching results of the plurality of spatial groupings.
2. The method of claim 1, wherein, for any candidate object, the ranking the candidate object according to the confidence level of the candidate object and the first cross-correlation between the first candidate frames of the candidate objects, to determine at least one candidate object with a matching level comprises: determining the candidate target as a first candidate target of a first matching level under the condition that the confidence coefficient of the candidate target is larger than or equal to a first confidence coefficient threshold value and the maximum value of the first cross ratio of the candidate target is smaller than or equal to a first cross ratio threshold value; determining the candidate target as a second candidate target of a second matching level in the case that the confidence level of the candidate target is smaller than the first confidence threshold and larger than or equal to a second confidence threshold, and the maximum value of the first intersection ratio of the candidate targets is smaller than or equal to the first intersection ratio threshold; Determining the candidate target as a third candidate target of a third matching level in the case that the confidence level of the candidate target is smaller than the first confidence threshold and larger than or equal to a second confidence threshold, and the maximum value of the first intersection ratio of the candidate target is larger than the first intersection ratio threshold and smaller than or equal to a second intersection ratio threshold; wherein the first confidence threshold is greater than the second confidence threshold, and the first cross ratio threshold is less than the second cross ratio threshold.
3. The method of claim 2, wherein the video stream comprises at least one of a first tracker of a first history object in a t-1 frame video frame that matches a success, a second tracker of a second history object in the t-1 frame video frame that matches a failure, in a set of trackers in a t-1 state, wherein trackers comprise an identification of the tracked object, location, characteristic information, a single object prediction mode of the object, and prediction parameters, The step of respectively matching the candidate targets of the at least one matching level according to the tracker set of the video stream in the t-1 state and a preset matching strategy to determine a target tracking result of the t frame video frame comprises the following steps: Extracting the characteristics of the first candidate target to obtain first characteristic information of the first candidate target; according to a first tracker of the first historical target, a corresponding single-target prediction mode is adopted to determine a first prediction frame of the first historical target in the t-th frame video frame; Determining a first loss matrix between the first historical target and the first candidate target according to the first characteristic information of the first candidate target, a first candidate frame of the first candidate target and a first prediction frame of the first historical target in the t-th frame video frame; Determining a first matching result between the first historical targets and the first candidate targets according to a first loss matrix, wherein the first matching result comprises at least one of a successfully matched first candidate target, a failed matching first candidate target and a failed matching first historical target in the first matching level; And aiming at a first candidate target successfully matched in the first matching result, determining the first candidate target as a target in the t-th frame video frame, wherein the identification of the target is the identification of a matched first historical target, and the position of the target is the position of a first candidate frame of the target.
4. A method according to claim 3, wherein said determining a first loss matrix for matching between said first historical target and said first candidate target based on said first characteristic information of said first candidate target, a first candidate box of said first candidate target, and a first prediction box of said first historical target in said t-th frame video frame, comprises: Determining a first feature distance matrix according to the first feature information of the first candidate target and the feature information of the first historical target; determining a first cross ratio distance matrix according to a first candidate frame of the first candidate target and a first prediction frame of the first historical target in the t-th frame video frame; And determining the first loss matrix according to the first characteristic distance matrix and the first cross-ratio distance matrix.
5. The method of claim 4, wherein the prediction modes of the first historical targets comprise at least one single target prediction mode, and the first cross-over distance matrix correspondingly comprises at least one cross-over distance matrix.
6. The method according to claim 3, wherein the matching the candidate targets of the at least one matching level according to the tracker set of the video stream in the t-1 state and a preset matching policy, to determine a target tracking result of the t frame video frame, further includes: According to the prediction parameters of a third historical target, predicting through a Kalman prediction model, and determining a second prediction frame of the third historical target in the t-th frame video frame, wherein the third historical target is a first historical target which fails to be matched in the first matching result; Determining a second cross ratio distance matrix according to a first candidate frame of a fourth candidate target and a second prediction frame of the third historical target, wherein the fourth candidate target is a first candidate target with failed matching in the first matching result; Determining a third cross ratio distance matrix according to the first candidate frame of the fourth candidate target and the first prediction frame of the third historical target in the t-th frame video frame; Determining a second loss matrix according to the second cross-ratio distance matrix and the third cross-ratio distance matrix; Determining a second matching result between the fourth candidate target and the third historical target according to the second loss matrix, wherein the second matching result comprises at least one of a fourth candidate target with successful matching, a fourth candidate target with failed matching and a third historical target with failed matching; And aiming at a fourth candidate target successfully matched in the second matching result, determining the fourth candidate target as a target in the t-th frame video frame, wherein the identification of the target is the identification of a matched third historical target, and the position of the target is the position of a first candidate frame of the target.
7. The method of claim 6, wherein the matching the candidate targets of the at least one matching level according to the tracker set of the video stream in the t-1 state and a preset matching policy, respectively, to determine a target tracking result of the t frame video frame, further comprises: determining a second feature distance matrix according to the first feature information of a fifth candidate target and the feature information of the second historical target, wherein the fifth candidate target is a fourth candidate target with failed matching in the second matching result; Determining a third matching result between the fifth candidate target and the second historical target according to the second characteristic distance matrix, wherein the third matching result comprises at least one of a fifth candidate target with successful matching, a fifth candidate target with failed matching and a second historical target with failed matching; Aiming at a fifth candidate target successfully matched in the third matching result, determining the fifth candidate target as a target in the t-th frame video frame; The identification of the target is the identification of a matched second historical target, and the position of the target is the position of a first candidate frame of the target.
8. The method of claim 7, wherein the matching the candidate targets of the at least one matching level according to the tracker set of the video stream in the t-1 state and a preset matching policy, respectively, to determine a target tracking result of the t frame video frame, further comprises: According to the prediction parameters of a fourth historical target, predicting through a Kalman prediction model, and determining a third prediction frame of the fourth historical target in the t-th frame video frame, wherein the fourth historical target comprises a third historical target with failed matching in the second matching result and a second historical target with failed matching in the third matching result; Determining a fourth prediction frame of the fourth historical target in the t-th frame video frame according to a single target prediction mode of the fourth historical target; Determining a third loss matrix according to a first candidate frame of a sixth candidate target and a third prediction frame and a fourth prediction frame of the fourth historical target, wherein the sixth candidate target comprises the second candidate target and the third candidate target; Determining a fourth matching result between the sixth candidate target and the fourth historical target according to the third loss matrix, wherein the fourth matching result comprises at least one of a sixth candidate target with successful matching, a sixth candidate target with failed matching and a fourth historical target with failed matching; And aiming at a sixth candidate target successfully matched in the fourth matching result, determining the sixth candidate target as a target in the t-th frame video frame, wherein the identification of the target is the identification of a matched fourth historical target, and the position of the target is the position of a first candidate frame of the target.
9. The method of claim 7, wherein the matching the candidate targets of the at least one matching level according to the tracker set of the video stream in the t-1 state and a preset matching policy, respectively, to determine a target tracking result of the t frame video frame, further comprises: Aiming at a fifth candidate target which is failed to be matched in the third matching result, determining the fifth candidate target as a target in the t-th frame video frame, wherein the identification of the target is a newly created identification, and the position of the target is the position of a first candidate frame of the target; wherein after the determining the target tracking result of the t-th frame video frame, the method further comprises: Creating a first tracker of the target in the tracker set, wherein the first tracker of the target comprises an identification, a position, characteristic information, a single target prediction mode and a prediction parameter of the target.
10. The method of claim 8, wherein after said determining the target tracking result for the tth frame video frame, the method further comprises: A seventh candidate target is determined, wherein the seventh candidate target comprises a first candidate target successfully matched in the first matching result, a fourth candidate target successfully matched in the second matching result, a fifth candidate target successfully matched in the third matching result and a sixth candidate target successfully matched in the fourth matching result; and updating the tracker of the seventh candidate target according to the position of the seventh candidate target.
11. The method of claim 10, wherein the updating the tracker of the seventh candidate target comprises: Updating the prediction parameters of a Kalman prediction model in a tracker of the seventh candidate target according to the position of the seventh candidate target; Determining a second cross ratio between a candidate frame of the seventh candidate target and a prediction frame determined by the single target prediction mode aiming at the single target prediction mode of the seventh candidate target; Initializing a prediction parameter corresponding to the single-target prediction mode under the condition that the second cross-over ratio is larger than the third cross-over ratio threshold; and updating a prediction parameter corresponding to the single-target prediction mode according to the position of the seventh candidate target under the condition that the second cross ratio is smaller than or equal to a third cross ratio threshold value.
12. The method of claim 11, wherein the updating the tracker of the seventh candidate target further comprises: updating the characteristic information in the tracker of the first candidate target according to the first characteristic information of the first candidate target aiming at the first candidate target in the seventh candidate targets; performing feature extraction on a second candidate target in the seventh candidate targets to obtain second feature information of the second candidate target; and updating the characteristic information in the tracker of the second candidate target according to the second characteristic information.
13. The method of claim 8, wherein after said determining the target tracking result for the tth frame video frame, the method further comprises: And aiming at a fourth historical target which fails to be matched in the fourth matching result, updating the prediction parameters of a Kalman prediction model in a tracker of the fourth historical target according to the position of a fourth prediction frame of the fourth historical target.
14. The method of claim 1, wherein grouping the targets by spatial correlation between tracking tracks according to the geographic locations of the plurality of acquisition regions and the tracking tracks of the targets in the video stream of the plurality of acquisition regions, comprises: Determining a first space grouping and a second space grouping according to the geographic positions of a first acquisition region and a second acquisition region and the tracking tracks of targets in video streams of the first acquisition region and the second acquisition region; wherein the first acquisition region and the second acquisition region are any two adjacent acquisition regions in the plurality of acquisition regions, The first spatial grouping includes objects entering the second acquisition region from the first acquisition region, and the second spatial grouping includes objects entering the first acquisition region from the second acquisition region.
15. The method of claim 14, wherein prior to grouping targets in the spatial grouping to obtain a plurality of temporal sub-groupings, the method further comprises: Determining a passing duration interval between the first acquisition region and the second acquisition region according to the distance between the first acquisition region and the second acquisition region, the highest speed limit, the lowest speed limit and the average vehicle speed; The step of grouping the targets in the space group according to the start time and the end time of the tracking track of the targets in the space group and the passing duration interval between the acquisition areas where the targets are located to obtain a plurality of time sub-groups, including: and classifying the corresponding targets into the same time sub-group under the condition that the ending time of the tracking track of the targets in the first acquisition region is before the starting time of the tracking track of the targets in the second acquisition region and the time difference between the ending time and the starting time is within the passing duration interval aiming at the first space group corresponding to the first acquisition region and the second acquisition region.
16. The method of claim 1, wherein the clustering the tracking tracks of the objects in each time sub-group to obtain the track matching result of the space group includes: And clustering tracking tracks of targets in each time sub-group by adopting a hierarchical clustering mode to obtain a track matching result of the space group.
17. An object tracking device comprising: the video stream acquisition module is used for acquiring a video stream of a preset acquisition area; The target detection module is used for carrying out target detection on a T-th frame video frame of the video stream, determining a first candidate frame of at least one candidate target in the T-th frame video frame, wherein T is an integer and 1<t is less than or equal to T, and T is the number of video frames of the video stream; The target grading module is used for grading the candidate targets according to the confidence degrees of the candidate targets and the first cross-correlation ratios among the first candidate frames of the candidate targets, and determining at least one candidate target with a matching level; the target matching module is used for respectively matching the candidate targets of the at least one matching level according to a tracker set of the video stream in a t-1 state and a preset matching strategy, and determining a target tracking result of the t frame video frame, wherein the target tracking result comprises the identification and the position of the target in the t frame video frame; The track determining module is used for determining a track of a target in the video stream according to a target tracking result of a T frame video frame of the video stream, wherein the track comprises a video frame corresponding to the target and a position in the corresponding video frame; The video stream includes a video stream of a plurality of acquisition regions, and after the determining a tracking trajectory of an object in the video stream, the apparatus further includes: The space grouping module is used for grouping the targets according to the geographic positions of the plurality of acquisition areas and the tracking tracks of the targets in the video streams of the plurality of acquisition areas through the space relevance among the tracking tracks to obtain a plurality of space groupings, and each space grouping comprises at least one target and the tracking tracks of the targets; Aiming at any space grouping, grouping the targets in the space grouping according to the starting time and the ending time of the tracking track of the targets in the space grouping and the passing duration interval between the acquisition areas where the targets are positioned to obtain a plurality of time sub-groupings; clustering tracking tracks of targets in each time sub-group respectively to obtain track matching results of the space groups, wherein the track matching results comprise tracking tracks belonging to the same target in video streams of different acquisition areas; and determining the track matching results of the video streams of the plurality of acquisition areas according to the track matching results of the plurality of spatial groupings.
18. An electronic device, comprising: One or more processors; A memory for storing one or more programs; The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the target tracking method of any of claims 1 to 16.
19. A non-transitory computer readable medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps in the object tracking method according to any of claims 1 to 16.

Description

Target tracking method and device, electronic equipment and computer readable medium Technical Field The disclosure belongs to the technical field of computers, and in particular relates to a target tracking method and device, electronic equipment and a non-transient computer readable medium. Background Target detection and tracking is a very important task in computer vision, which is able to estimate the information of a target (e.g. a vehicle, a person or an object) in the field of view by means of input data from a sensor, and detect and track a single target or multiple targets in the field of view. In application scenes such as smart cities and traffic analysis, the effect of target detection and tracking is particularly important. The target tracking mode in the related technology is easily influenced by visual angles, light rays and crowded targets, has higher target loss risk and has poorer target tracking effect. Disclosure of Invention The present disclosure aims to solve at least one of the technical problems in the related art, and provides a target tracking method and apparatus, an electronic device, and a non-transitory computer readable medium. In a first aspect, an embodiment of the present disclosure provides a target tracking method, including: Acquiring a video stream of a preset acquisition area; Performing target detection on a T-th frame video frame of the video stream, determining a first candidate frame of at least one candidate target in the T-th frame video frame, wherein T is an integer and 1<t is less than or equal to T, and T is the number of video frames of the video stream; Classifying the candidate targets according to the confidence degrees of the candidate targets and the first intersection ratios among the first candidate frames of the candidate targets, and determining at least one candidate target with a matching level; Respectively matching the candidate targets of the at least one matching level according to a tracker set of the video stream in a t-1 state and a preset matching strategy, and determining a target tracking result of the t frame video frame, wherein the target tracking result comprises the identification and the position of the target in the t frame video frame; and determining a tracking track of the target in the video stream according to a target tracking result of the T-frame video frame of the video stream, wherein the tracking track comprises the video frame corresponding to the target and the position in the corresponding video frame. In a second aspect, embodiments of the present disclosure provide a target tracking apparatus, the apparatus comprising: the video stream acquisition module is used for acquiring a video stream of a preset acquisition area; The target detection module is used for carrying out target detection on a T-th frame video frame of the video stream, determining a first candidate frame of at least one candidate target in the T-th frame video frame, wherein T is an integer and 1<t is less than or equal to T, and T is the number of video frames of the video stream; The target grading module is used for grading the candidate targets according to the confidence degrees of the candidate targets and the first cross-correlation ratios among the first candidate frames of the candidate targets, and determining at least one candidate target with a matching level; the target matching module is used for respectively matching the candidate targets of the at least one matching level according to a tracker set of the video stream in a t-1 state and a preset matching strategy, and determining a target tracking result of the t frame video frame, wherein the target tracking result comprises the identification and the position of the target in the t frame video frame; The track determining module is used for determining a track of the target in the video stream according to a target tracking result of the T-frame video frame of the video stream, wherein the track comprises the video frame corresponding to the target and the position in the corresponding video frame. In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors, a memory for storing one or more programs; The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the target tracking method described above. In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer readable medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps in the above-described target tracking method. Drawings Fig. 1a and 1b are schematic diagrams of the same object in different situations. Fig. 2 is a flow chart of a target tracking method of an embodiment of the present disclosure. FIG. 3 is a schematic diagram of target ranking according to an embodiment of the present disclosure. Fig