CN-121999013-A - Multi-camera multi-target tracking method, device, equipment and storage medium

CN121999013ACN 121999013 ACN121999013 ACN 121999013ACN-121999013-A

Abstract

The application provides a multi-camera multi-target tracking method, a multi-camera multi-target tracking device, multi-target tracking equipment and a storage medium, and relates to the technical field of visual tracking. According to the method, under the multi-camera multi-target tracking scene, the target object cannot be continuously tracked in the first video stream acquired by the initial camera, and the second video stream of the associated camera is decoded in parallel, so that the image data processing efficiency is improved. Inputting the image frame data into the image processing model corresponding to each associated camera to perform feature extraction and mask prediction, so as to realize efficient target identification and positioning. And comparing and matching the current image characteristics with the target template characteristics, rapidly screening out image frames containing the target object, and improving the recognition efficiency. And extracting target tracking information from the image frame data of the associated camera of the detected target object and generating a target tracking track, so that the cross-camera continuous tracking of the target object is realized, the integrity and the accuracy of the track are ensured, and the tracking efficiency and the tracking continuity of multi-camera multi-target tracking are effectively improved.

Inventors

Yu Ruoning

Assignees

北京算能科技有限公司

Dates

Publication Date: 20260508
Application Date: 20251231

Claims (10)

1. A multi-camera multi-target tracking method, the method comprising: extracting target template characteristics of each target object from image frames of a first video stream acquired by an initial camera when the target object cannot be continuously tracked in the first video stream; decoding image frames in parallel from a second video stream acquired by at least one associated camera associated with the initial camera to obtain image frame data corresponding to each associated camera; inputting image frame data corresponding to each associated camera into an image processing model corresponding to each associated camera to perform feature extraction and mask prediction to obtain current image features of current image frames of each associated camera; performing feature contrast and matching on the current image features of the current image frames of each associated camera and the target template features, and outputting a detection result of the target object in the current image frames of each associated camera; And when the detection result shows that the target object exists in the current image frame of the associated camera, extracting target tracking information of the target object in the image frame data corresponding to the associated camera, and generating a target tracking track of the target object.
2. The multi-camera multi-target tracking method according to claim 1, wherein the target tracking information includes target feature information and mask information of a target object; the extracting the target tracking information of the target object in the image frame data corresponding to the associated camera, and generating a target tracking track of the target object includes: Extracting target tracking information of the target object in each image frame in image frame data of the associated camera based on the image processing model; extracting a target tracking frame of the target object in each image frame based on the mask information; And generating a target tracking track of the target object based on the target center points of the target tracking frames in at least two image frames.
3. The multi-camera multi-target tracking method of claim 2, wherein the target tracking information further includes current location information; The extracting, based on the image processing model, target tracking information of the target object in each image frame in image frame data of the associated camera further includes: predicting next position information of the target object based on the current position information and current motion information of the target object; and inquiring next position information based on the acquisition area range of each associated camera, and determining a target associated camera corresponding to the next position information so as to track the target object in the second video stream corresponding to the target associated camera.
4. The multi-camera multi-target tracking method of claim 2, wherein target template features of the target object are stored in a shared repository; The extracting, based on the image processing model, target tracking information of the target object in each image frame in image frame data of the associated camera further includes: And caching the target feature information into the shared memory bank, iteratively updating the target template features of the target object in the shared memory bank, and taking the latest extracted target feature information as the target template features of the target object.
5. The multi-camera multi-target tracking method according to claim 1, wherein the performing feature comparison and matching between the current image feature of the current image frame of each associated camera and the target template feature, and outputting the detection result of the target object in the current image frame of each associated camera, includes: calculating the feature similarity between the current image feature of the current image frame of each associated camera and the target template feature; And when the feature similarity corresponding to any current image frame is greater than or equal to a preset similarity threshold, determining that the detection result is that the target object exists in the current image frame.
6. The multi-camera multi-target tracking method according to claim 1, wherein when the detection result indicates that the target object exists in the current image frame of the associated camera, extracting target tracking information of the target object in the image frame data corresponding to the associated camera, and generating a target tracking track of the target object, further comprises: If no target object is detected in the first video stream corresponding to the initial camera and the second video stream corresponding to each associated camera, and the duration reaches a first duration threshold, marking the tracking state of the target object as a first state; If no target object is detected in the first video stream corresponding to the initial camera and the second video stream corresponding to each associated camera, and the duration reaches a second duration threshold, marking the tracking state of the target object as a second state; wherein the second time duration threshold is greater than the first time duration threshold.
7. The multi-camera multi-target tracking method of claim 1, wherein the extracting target template features of each of the target objects in the image frames of the first video stream comprises: performing real-time reasoning segmentation on the selected target according to a target selection instruction of a user side, and generating a target mask corresponding to the selected target; extracting characteristic information of the selected target; and caching the target mask corresponding to the selected target and the characteristic information as the target template characteristics.
8. A multi-camera multi-target tracking device, the multi-camera multi-target tracking device comprising: the template feature extraction module is used for extracting target template features of each target object in an image frame of a first video stream acquired by an initial camera when the target object cannot be continuously tracked; An image frame decoding module, configured to decode image frames in parallel from a second video stream acquired by at least one associated camera associated with the initial camera, to obtain image frame data corresponding to each of the associated cameras; The image frame feature extraction module is used for inputting image frame data corresponding to each associated camera into an image processing model corresponding to each associated camera to perform feature extraction and mask prediction, so as to obtain the current image feature of the current image frame of each associated camera; the target object detection module is used for carrying out feature comparison and matching on the current image features of the current image frames of each associated camera and the target template features, and outputting the detection result of the target object in the current image frames of each associated camera; And the track generation module is used for extracting target tracking information of the target object in the image frame data corresponding to the associated camera when the detection result indicates that the target object exists in the current image frame of the associated camera, and generating a target tracking track of the target object.
9. A computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the multi-camera multi-object tracking method of any of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the multi-camera multi-target tracking method according to any of claims 1 to 7.

Description

Multi-camera multi-target tracking method, device, equipment and storage medium Technical Field The present application relates to the field of visual tracking technologies, and in particular, to a multi-camera multi-target tracking method, device, apparatus, and storage medium. Background Multi-target tracking (MOT) technology is of great importance in the fields of intelligent security, intelligent traffic, robot navigation and the like, but faces increasingly complex real-world scenes, especially multi-camera wide-range coverage and high concurrency and real-time requirements of edge computing environments, and the prior art faces a plurality of challenges. On the one hand, the traditional tracking method often relies on bounding box detection, and when a target is blocked, moves across a visual angle or has obvious change in appearance, the problems of target loss, ID switching and the like easily occur. On the other hand, the conventional tracking method has insufficient universality and adaptability, and needs to be optimized for a specific scene under a specific application scene (monitoring and industrial detection). On the other hand, in a multiphase airport scene, the target may move between different cameras, or be blocked, and briefly leave the field of view at the same camera view angle, so that the conventional single-camera tracking method cannot continuously track. The existing target Re-identification (Re-ID) method depends on simple visual feature matching, is easily influenced by factors such as visual angle change, illumination conditions, target gestures and the like, and is difficult to cope with complex and changeable environments because of insufficient robust cross-camera association capability. In addition, although the main stream visual basic model such as SAM2 is excellent in the image and video segmentation field, the calculation requirement is high and the reasoning speed is low due to the huge parameter quantity and the complex Vision Transformer (ViT) encoder structure, so that the wide application of the main stream visual basic model in the scene with high concurrency and real-time requirements of multiple cameras at the edge end is limited. Therefore, how to improve the multi-target tracking efficiency in multi-camera scenarios is a technical problem to be solved. Disclosure of Invention The application provides a multi-camera multi-target tracking method, a multi-camera multi-target tracking device, multi-target tracking equipment and a storage medium, and aims to improve multi-target tracking efficiency in a multi-camera scene. In a first aspect, the present application provides a multi-camera multi-target tracking method, including the steps of: extracting target template characteristics of each target object from image frames of a first video stream acquired by an initial camera when the target object cannot be continuously tracked in the first video stream; decoding image frames in parallel from a second video stream acquired by at least one associated camera associated with the initial camera to obtain image frame data corresponding to each associated camera; inputting image frame data corresponding to each associated camera into an image processing model corresponding to each associated camera to perform feature extraction and mask prediction to obtain current image features of current image frames of each associated camera; performing feature contrast and matching on the current image features of the current image frames of each associated camera and the target template features, and outputting a detection result of the target object in the current image frames of each associated camera; And when the detection result shows that the target object exists in the current image frame of the associated camera, extracting target tracking information of the target object in the image frame data corresponding to the associated camera, and generating a target tracking track of the target object. In a second aspect, the present application also provides a multi-camera multi-target tracking device, comprising: the template feature extraction module is used for extracting target template features of each target object in an image frame of a first video stream acquired by an initial camera when the target object cannot be continuously tracked; An image frame decoding module, configured to decode image frames in parallel from a second video stream acquired by at least one associated camera associated with the initial camera, to obtain image frame data corresponding to each of the associated cameras; The image frame feature extraction module is used for inputting image frame data corresponding to each associated camera into an image processing model corresponding to each associated camera to perform feature extraction and mask prediction, so as to obtain the current image feature of the current image frame of each associated camera; the target object detection module is used for