KR-20260064240-A - Method and Computing Device for Object Position Tracking Based on Monocular Camera in Indoor Environment, and Recording Medium Thereof

KR20260064240AKR 20260064240 AKR20260064240 AKR 20260064240AKR-20260064240-A

Abstract

The present invention relates to a digital twin technology capable of tracking the movement path of an object and estimating its pose by generating a Bird's Eye View (BEV) through 3D point cloud generation and floor surface detection without camera correction or satellite imagery in an indoor environment, and calculating a homography transformation matrix with the camera image based on the BEV to represent the position information of the object detected by multiple cameras as a single integrated map.

Inventors

김성호
김도녕
윤영균

Assignees

영남대학교 산학협력단
피에이치에이 주식회사

Dates

Publication Date: 20260507
Application Date: 20241031

Claims (11)

A step of generating a 3D point cloud from images acquired from multiple monocular cameras; A step of detecting a floor surface from the above 3D point cloud and generating a Bird's Eye View (BEV) map; A step of calculating a homography transformation matrix between the image of the indoor camera and the BEV map; and A step of converting the position of an object detected by the monocular camera into coordinates on the BEV map using the homography transformation matrix; An indoor object location tracking method including
In paragraph 1, The above floor surface detection step is, An indoor object location tracking method that detects the normal vector and location of a floor surface from the 3D point cloud by combining the RANSAC algorithm and the SVD (Singular Value Decomposition) method.
In paragraph 1, The above homography transformation matrix calculation step is, A step of masking the floor area in the image of the above monocular camera; A step of detecting four polygons having the largest area in the masked region; and A step of calculating a homography transformation matrix between the detected polygons and corresponding points on the BEV map; A method for tracking the location of an indoor object, including
In paragraph 1, A step of detecting an object in the monocular camera image using an object detection model; and A step of extracting the coordinates of the bottom part of the detected object; Includes more, An indoor object location tracking method in which the position of the above object is transformed based on the floor portion coordinates.
In paragraph 1, A step of displaying the movement path of an object as a heatmap on the above BEV map; and A step of projecting the pose estimation result of the object onto the BEV map; An indoor object location tracking method that further includes
A recording medium storing a computer-readable program coded for the indoor object location tracking method described in any one of claims 1 to 5.
Memory for storing a program coded so that a computer can read a method for tracking the location of indoor objects; and A processor that executes the above program; including, The above indoor object location tracking method is, A step of generating a 3D point cloud from images acquired from multiple monocular cameras; A step of detecting a floor surface from the above 3D point cloud and generating a Bird's Eye View (BEV) map; A step of calculating a homography transformation matrix between the image of the indoor camera and the BEV map; and A step of converting the position of an object detected by the monocular camera into coordinates on the BEV map using the homography transformation matrix; An arithmetic unit including
In Paragraph 7, The above floor surface detection step is, A computing device that detects the normal vector and position of a floor surface from the 3D point cloud by combining the RANSAC algorithm and the SVD (Singular Value Decomposition) method.
In Paragraph 7, The above homography transformation matrix calculation step is, A step of masking the floor area in the image of the above monocular camera; A step of detecting four polygons having the largest area in the masked region; and A step of calculating a homography transformation matrix between the detected polygons and corresponding points on the BEV map; An arithmetic unit including
In Paragraph 7, The above indoor object location tracking method is, A step of detecting an object in the monocular camera image using an object detection model; and A step of extracting the coordinates of the bottom part of the detected object; Includes more, A computing device in which the position of the above object is transformed based on the coordinates of the above floor part.
In Paragraph 7, The above indoor object location tracking method is, A step of displaying the movement path of an object as a heatmap on the above BEV map; and A step of projecting the pose estimation result of the object onto the BEV map; An arithmetic unit that further includes.

Description

Method and Computing Device for Object Position Tracking Based on Monocular Camera in Indoor Environment, and Recording Medium Thereof The present invention relates to a digital twin technology capable of tracking the movement path of an object and estimating its pose by generating a Bird's Eye View (BEV) through 3D point cloud generation and floor surface detection without camera correction or satellite imagery in an indoor environment, and calculating a homography transformation matrix with the camera image based on the BEV to represent the position information of the object detected by multiple cameras as a single integrated map. The demand for camera-based systems for object location tracking and pathfinding is increasing in various fields, including virtual reality, autonomous driving, automated monitoring systems, and the medical sector. In this regard, significant research has been conducted on object recognition and tracking in video footage for autonomous vehicles and traffic surveillance cameras. For outdoor environments, 3D-net has proposed a methodology for applying real data by combining satellite imagery and traffic surveillance camera images. Specifically, 3D-net introduced an automatically corrected satellite-ground-based back-projection mapping (SG-IPM) method that calculates the homography transformation between the satellite imagery's Bird's Eye View (BEV) and the traffic camera's viewpoint using matching points based on the traffic camera's location information. However, this approach has limitations in that it cannot be applied to indoor environments where acquiring BEV through satellite imagery is impossible. In the field of multi-view detection, various methods have been proposed to improve recognition rates by integrating multi-view features into spatial information following BEV transformation using multi-view images with overlapping regions and given camera parameters. However, as mentioned in these papers, the requirements regarding camera parameters pose difficulties in applying these methods to real-world data due to calibration issues. Furthermore, these methods have limitations in that while they improve object localization and recognition, they fail to provide spatial background information. Recently, several re-identification technologies integrating Large Language Models (LLMs) for object description have been introduced in the field of object tracking. These technologies aimed to improve tracking performance by combining the visual features of objects with linguistic descriptions. However, these approaches have not solved the fundamental problem of acquiring location information in indoor environments. In particular, research on integrating non-overlapping images and extracting location information in indoor environments has been limited. Most existing methods require camera calibration or presuppose environments where BEV acquisition is possible, such as satellite imagery, which has limited their application in indoor settings. Furthermore, integrating multi-view camera images requires overlapping areas between cameras or precise camera parameters, which has restricted their application in real-world environments. Therefore, there is a need to develop new technology capable of tracking the location and determining the path of objects in indoor environments without camera calibration or pre-acquired aerial imagery. FIG. 1 is a flowchart of an indoor object location tracking method according to one embodiment of the present invention. Figure 2 is a block diagram showing the process of data processing. Figure 3 is a block diagram showing the schematic configuration of the arithmetic unit. Embodiments of the present invention will be described in detail below with reference to the drawings. However, detailed descriptions of known functions or configurations that may obscure the essence of the present invention in the following description and the attached drawings are omitted. Additionally, throughout the specification, the term 'comprising' a component means that, unless specifically stated otherwise, it does not exclude other components but may include additional components. Additionally, terms such as first, second, etc. may be used to describe various components, but said components should not be limited by said terms. said terms may be used for the purpose of distinguishing one component from another component. For example, without departing from the scope of the present invention, the first component may be named the second component, and similarly, the second component may be named the first component. The terms used in this invention are used merely to describe specific embodiments and are not intended to limit the invention. The singular expression includes the plural expression unless the context clearly indicates otherwise. In this application, terms such as "comprising" or "comprising" are intended to specify the existence of the described features, numbers, steps