EP-4261777-B1 - METHOD, COMPUTER PROGRAM, DEVICE AND SYSTEM FOR MONITORING A TARGET OBJECT

EP4261777B1EP 4261777 B1EP4261777 B1EP 4261777B1EP-4261777-B1

Inventors

METGE, BAPTISTE

Dates

Publication Date: 20260513
Application Date: 20220411

Claims (12)

A method (100;200) for tracking a target object in an image stream captured by a camera (402 1 -402 n ), at a capture frequency, F c , said method (100;200) comprising several iterations of a tracking phase (110), implemented individually for several images, called processed images, of said image stream, and comprising the following steps: - detecting (114) at least one object, and its position, in the processed image, and - identifying (116-124) said target object among the at least one object detected in said processed image; said tracking phase (110) being carried out at a detection frequency, F s , lower than said capture frequency F c , so that two images processed during two successive iterations of the tracking phase are separated by at least one non-processed image to which said tracking phase is not applied; the step of identifying a target object in a processed image comprising the following steps: - for each object detected in said processed image, calculating (116) a spatial distance between the position of said object and the position of the target object detected on a previously processed image, - spatial filtering (116; 118) of the objects based on said calculated distances and a predetermined spatial distance threshold value, SDS - calculating (120; 122) an appearance distance between a visual signature of the target object detected on the previously processed image and a visual signature of each object retained after the filtering step, and - identifying (124) the target object based on said appearance distances; said SDS value being proportional to the width of the target object on said processed image, such that: SDS = K * L where L is the width of the target object and K is a multiplier coefficient.
The method (100;200) according to the preceding claim, characterized in that it comprises a step (140) of estimating the position of the target object at a time located between the capture times of two processed images during two successive iterations of the tracking phase, based upon the positions of said target object in said processed images.
The method (100;200) according to any one of the preceding claims, characterized in that the tracking phase (110) is implemented for each image every N image(s), where N≥2, and preferentially N≥20, so that two successive iterations of the tracking phase (110) are applied to two images separated, over time, from at least one image, and in particular from N images, which are not processed.
The method (100; 200) according to any one of claims 1 or 2, characterized in that the tracking phase is carried out for each image captured every DUR seconds.
The method (200) according to any one of the preceding claims, characterized in that the image stream is captured prior to the first iteration of the tracking phase (110) so that the target object is not tracked in real time.
The method (100) according to any one of claims 1 to 4, characterized in that it is implemented to carry out real-time tracking of the target object, said method (100) further comprising a step (106) of transmitting each processed image from the camera (402 1 -402 n ) to a tracking device (300; 406)
The method (100) according to the preceding claim, characterized in that the step (106) of transmitting a processed image from the camera (402 1 -402 n ) to the tracking device (300; 406) is carried out at the request of said tracking device (300; 406).
The method (100) according to any one of claims 6 or 7, characterized in that the camera (402 1 -402 n ) is arranged to capture only the processed images.
The method (100; 200) according to any one of the preceding claims, characterized in that the detection step (114) is carried out by an artificial intelligence model, and in particular by a neural network, previously trained to detect the presence of an object in an image.
A computer program comprising executable instructions, which, when they are executed by a computer apparatus, implement all the steps of the tracking method (100;200) according to any one of the preceding claims.
A device (300;406) for tracking a target object comprising means configured to implement all the steps of the tracking method (100;200) according to any one of claims 1 to 9.
A system (400) for tracking a target object comprising: - at least one camera (402 1 -402 n ) with an image stream capture frequency, F c , and - a device (300;406) according to the preceding claim.

Description

The present invention relates to a method for tracking a target object in images taken by at least one camera. It also relates to a computer program, a device, and a system implementing such a method. The field of the invention is generally the field of tracking objects from images captured by cameras, also called tracking in English, and in particular in real time. State of the art Cities are increasingly equipping themselves with CCTV cameras, the number of which is growing faster than the number of human operators. Tracking a target object, such as a person or a vehicle, is very difficult. If the operator is disturbed, they may lose track of the target object, and finding it again can be particularly tedious. Tracking multiple target objects simultaneously is even more challenging. We are familiar with tracking solutions based on deep learning models, particularly re-identification models. These solutions aim to process images from a stream of images from one or more cameras to identify the target object, and more generally, all moving objects, within each image of that stream. Typically, each image in the stream is first processed by an object detector, such as one implementing the RESNET50 model, to identify at least one object and its position within the image. Then, each object is identified by comparing its visual signature with those of objects identified in previously processed images. Thus, the same target object can be identified in all the images in which it appears, and a trajectory, or tracklet, The object can be determined by tracking its movement within each image, and therefore within the imaged scene. The documents US2022/004768A1 , EP3839816A1 And CA3156840A1 describe different approaches for object tracking based on image analysis. However, these solutions are complex, energy-intensive, and require significant computing resources. These drawbacks limit their deployment in the real world. One object of the present invention is to remedy at least one of the drawbacks of the prior art. Another aim of the invention is to offer an object tracking solution that is less energy-intensive and requires less computing resources, while offering similar or even identical performance to current solutions. Description of the invention The invention proposes to achieve at least one of the aforementioned goals by a method of tracking a target object in a stream of images captured by a camera at a capture frequency, F c , said method comprising several iterations of a tracking phase implemented individually for several images, called processed, of said stream of images, and comprising the following steps: detection in the processed image of at least one object and its position, And identification of said target object among the at least one object detected in said processed image; characterized in that said tracking phase is carried out at a detection frequency, Fs , lower than said capture frequency Fc , so that two images processed during two successive iterations of the tracking phase are separated from at least one unprocessed image to which said tracking phase is not applied. Thus, the invention proposes to track a target object by processing only a portion of the images in the image stream captured by the camera. In other words, the invention proposes to forgo processing all the images in the image stream and instead process only certain images from the image stream. Thus, the solution proposed by the present invention is less energy-intensive and requires less computing resources compared to current solutions that propose to process all the images in an image stream. Furthermore, the inventor of this application has observed that processing only a portion of the images does not significantly reduce tracking performance. Indeed, the inventor has noted that the capture frequency of current cameras is such that the movement of a target object can be accurately and reliably deduced and tracked from only a portion of the images forming a stream of images captured by the camera. This observation is even more true when the target object's speed is low, such as that of humans. In this application, "object" or "target object" means any type of object, such as a human, an animal, a car, etc. In this application, "tracklet" means a set of at least one image, or image area, belonging to the same object and captured by a camera. As is known, the appearance distance between two images can be calculated by generating a digital signature for each of the images, for example by an intelligent model such as a neural network, and then calculating the distance, Euclidean or cosine, between these two digital signatures. By "camera" we mean any type of image acquisition device, such as any type of RGB, LIDAR, thermal, 3D camera, etc. A "processed image" refers to an image in the image stream to which the tracking phase has been applied. In contrast, an "unprocessed image" refers to an image in the image stream to whic