KR-20260062384-A - METHOD AND DEVICE FOR TRACKING MULTIPLE OBJECTS PHOTOGRAPHED BASED ON MULTIPLE CAMERAS

KR20260062384AKR 20260062384 AKR20260062384 AKR 20260062384AKR-20260062384-A

Abstract

An object tracking method performed by a processor according to one embodiment comprises: a step of extracting first object information for a first object included in a first frame image corresponding to a first time point in a first video captured based on a first camera; a step of extracting second object information for a second object included in a second frame image corresponding to a second time point later than the first time point in the first video; a step of extracting third object information for a third object included in a third frame image corresponding to a third time point in a second video captured based on a second camera capturing a field of view that partially overlaps with the first camera; and a step of generating a first embedding vector corresponding to an appearance feature of the first object, a second embedding vector corresponding to an appearance feature of the second object, and a third embedding vector corresponding to the third object based on applying an artificial intelligence model to the first object information, the second object information, and the third object information. The method may include the step of tracking the first object on the first video by connecting the first object information and the second object information based on the first distance between the first embedding vector and the third embedding vector and the second distance between the second embedding vector and the third embedding vector.

Inventors

김형원
퐁푸 닌

Assignees

주식회사 엠시스랩
충북대학교 산학협력단

Dates

Publication Date: 20260507
Application Date: 20241029

Claims (12)

In an object tracking method performed by a processor, A step of extracting first object information for a first object included in a first frame image corresponding to a first time point among a first video captured based on a first camera; A step of extracting second object information for a second object included in a second frame image corresponding to a second time point later than a first time point among the first videos; A step of extracting third object information for a third object included in a third frame image corresponding to a third point in time among a second video captured based on a second camera capturing a field of view that partially overlaps with the first camera; Based on applying an artificial intelligence model to the first object information, the second object information, and the third object information, a step of generating a first embedding vector corresponding to an appearance feature of the first object, a second embedding vector corresponding to an appearance feature of the second object, and a third embedding vector corresponding to the third object; and A step of tracking the first object on the first video by connecting the first object information and the second object information based on the first distance between the first embedding vector and the third embedding vector and the second distance between the second embedding vector and the third embedding vector. An object tracking method including
In paragraph 1, The step of tracking the first object on the first video is, A step of determining whether each of the first distance and the second distance is less than or equal to a predetermined threshold distance; and Based on the above judgment result, by connecting the first object information and the second object information, a step of generating global object information including a first unique identifier corresponding to the first object on the first video and a second unique identifier corresponding to the second object; and A step of tracking the first object on the first video based on the above global object information An object tracking method including
In paragraph 1, The step of tracking the first object on the first video is, A step of calculating a first appearance affinity between the first object information and the third object information based on the first embedding vector and the third embedding vector; A step of calculating the first distance based on the first external shape similarity and a predetermined first position affinity between the first object information and the third object information; A step of calculating a second appearance similarity between the second object information and the third object information based on the second embedding vector and the third embedding vector; A step of calculating the second distance based on the second external appearance similarity and a predetermined second position similarity between the second object information and the third object information; and When the first distance and the second distance are less than or equal to a predetermined threshold distance, the step of tracking the first object on the first video by connecting the first object information and the second object information. An object tracking method including
In paragraph 1, The step of extracting the second object information above is, A step of extracting second object information based on the case where the first object is occluded in a frame image corresponding to at least one time point between the first time point and the second time point in the first video. An object tracking method including
In paragraph 1, The step of extracting the first object information and the second object information is A step of extracting first object information by applying a first object tracking algorithm to the first frame image; and Based on applying the first object tracking algorithm to the second frame image, the method includes the step of extracting the second object information. The step of extracting the above third object information is, Based on applying a second object tracking algorithm to the third frame image, the method includes the step of extracting the third object information. The first object tracking algorithm and the second object tracking algorithm are different, Object tracking method.
In paragraph 1, The step of generating the first embedding vector, the second embedding vector, and the third embedding vector is, A step of generating the first embedding vector, the second embedding vector, and the third embedding vector based on applying the artificial intelligence model, which includes an appearance feature encoder trained based on a predetermined dataset, to the first object information, the second object information, and the third object information. An object tracking method including
In paragraph 1, A step of calculating the overlap ratio between a second bounding box included in the second object information and additional bounding boxes corresponding to other objects on the second frame image; A step of calculating a confidence score corresponding to the probability that the second object is included in the second bounding box; and A step of updating the first embedding vector to the second embedding vector based on the case where the overlap ratio is less than or equal to a predetermined threshold ratio and the reliability score is greater than or equal to a predetermined threshold reliability. An object tracking method that further includes
In paragraph 1, A step of calculating an intersection to union (IoU) value between a first bounding box included in the first object information and a second bounding box included in the second object information; A step of calculating a number of co-occurrences indicating the number of times the first bounding box and the second bounding box are simultaneously included on the first video; and A step of connecting the first object information and the second object information based on the above intersection versus union value and the above number of co-occurrences. An object tracking method that further includes
In paragraph 1, If the first object is obscured at an intermediate point between the first point and the second point in the first video, the method comprises the step of generating an intermediate bounding box corresponding to the first object at the intermediate point based on the first object information, the second object information, the first point, and the second point; and A step of connecting the first object information and the second object information based on the above intermediate bounding box An object tracking method that further includes
In paragraph 1, The above third point in time is A time point within a predetermined critical time difference with each of the first time point and the second time point, Object tracking method.
A computer-readable storage medium storing one or more computer programs comprising instructions for performing the method of any one of claims 1 to 10.
In an object tracking device, First camera for shooting the first video; A second camera that captures a second video with a field of view that partially overlaps with the first camera; First object information for a first object included in a first frame image corresponding to a first time point in the first video is extracted, second object information for a second object included in a second frame image corresponding to a second time point later than the first time point is extracted, and third object information for a third object included in a third frame image corresponding to a third time point in the second video is extracted, and based on applying an artificial intelligence model to the first object information, the second object information, and the third object information, a first embedding vector corresponding to the appearance feature of the first object, a second embedding vector corresponding to the appearance feature of the second object, and a third embedding vector corresponding to the third object are generated, and based on a first distance between the first embedding vector and the third embedding vector and a second distance between the second embedding vector and the third embedding vector, the first object information and the second object information are connected, thereby the first object on the first video A tracking processor; and Memory that stores instructions executed by the above processor An object tracking device including

Description

Method and device for tracking multiple objects photographed based on multiple cameras A technology for tracking multiple objects captured by multiple cameras is disclosed below. Multi-object tracking (MOT) is a technology that localizes objects within a video and assigns a unique identifier to them. MOT is a major area of research in the fields of object detection and autonomous driving. Most 2D MOT methods are performed within the limited field of view of a single camera. Consequently, if an object is temporarily obscured by the limited field of view of a single camera, a problem may arise where object tracking is lost. For example, in MOT methods using a single camera, if an object being tracked is obscured and then localized again, a problem may occur where a different identifier is assigned than before it was obscured. However, autonomous vehicles or ADAS (advanced driving assistance systems) can utilize multiple cameras to perform MOT on surrounding objects. The fields of view of multiple cameras installed in a vehicle may partially overlap. In areas where the camera fields overlap, even if an object temporarily disappears or is obscured by one camera, the object can still be identified by another camera. Therefore, the following discloses a technology that improves the accuracy of MOT in autonomous driving or ADAS systems through multiple cameras. FIG. 1 illustrates an object tracking device according to one embodiment. FIG. 2 is a flowchart of an object tracking method according to one embodiment. FIG. 3 is a schematic block diagram illustrating the operation of an object tracking device according to one embodiment. FIG. 4 is a diagram illustrating a method for an electronic device to calculate an overlap ratio when bounding boxes corresponding to multiple objects overlap on a single frame. FIG. 5 is a diagram illustrating a multi-camera tracklet connection operation and a lost tracklet recovery operation performed by an electronic device according to one embodiment. Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be modified and implemented in various forms. Accordingly, actual implementations are not limited to the specific embodiments disclosed, and the scope of this specification includes modifications, equivalents, or substitutions included in the technical concept described by the embodiments. Terms such as "first" or "second" may be used to describe various components, but these terms should be interpreted solely for the purpose of distinguishing one component from another. For example, the first component may be named the second component, and similarly, the second component may be named the first component. When it is stated that a component is "connected" to another component, it should be understood that it may be directly connected to or coupled with that other component, or that there may be other components in between. The singular expression includes the plural expression unless the context clearly indicates otherwise. In this specification, terms such as "comprising" or "having" are intended to specify the existence of the described features, numbers, steps, actions, components, parts, or combinations thereof, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this specification. Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the attached drawings, identical components are given the same reference numeral regardless of the drawing number, and redundant descriptions thereof will be omitted. FIG. 1 illustrates an object tracking device according to one embodiment. An object tracking device (hereinafter, electronic device (100)) according to one embodiment can track an object included in a video and/or a plurality of images. For example, the video may include a plurality of frame images corresponding to individual time points. For example, the video may include a first frame image corresponding to a first time point and a second frame image corresponding to a second time point. In this case, the first time point may represent a time point prior to the second time point. The electronic device (100) can recognize an object in the plurality of frame images corresponding to each time point. The electronic device (100) can extract object information for the objects recognized in each of the plurality