KR-20260062215-A - method of providing intelligent and selective surveillance with queries of natural language by use of vectorized information

KR20260062215AKR 20260062215 AKR20260062215 AKR 20260062215AKR-20260062215-A

Abstract

The present invention relates to an intelligent selective monitoring technology based on natural language queries utilizing vectorization, configured to vectorize object images and frame images of CCTV footage using an AI image embedding neural network model and store them in a database, and to vectorize a user's natural language query regarding a target object or situation through an AI natural language embedding model, and then search the database for object images or frame images that match the natural language query based on vector similarity. According to the present invention, there is an advantage in being able to flexibly implement intelligent selective monitoring in various situations through an intermediate step called a vector, which allows for the comparison of similarity between natural language and images. Furthermore, there is an advantage in increasing the maintenance efficiency of the intelligent selective monitoring system as there is no need to develop a new neural network model to detect new objects or situations.

Inventors

전준형
김윤성
윤성탁
노형철
김도윤
이성진

Assignees

이노뎁 주식회사

Dates

Publication Date: 20260507
Application Date: 20241028

Claims (5)

A natural language query-based intelligent selective control method using vector information performed by a selective control device (400) to implement intelligent selective control by vector similarity between frame images and object images of CCTV video and a user's natural language query, A step of generating object vector embedding data by vectorizing each object image detected in CCTV footage using an image embedding neural network model; A step of storing the object vector embedding data in a vector database (410); A step of receiving a natural language query about a search target from the user; A step of generating natural language vector embedding data by vectorizing the above natural language query using a natural language embedding neural network model; A step of calculating vector similarity between the object vector embedding data and the natural language vector embedding data stored in the vector database (410); and A step of outputting object vector embedding data, in which the vector similarity is greater than or equal to a preset threshold, as a selective control image for the natural language query; An intelligent screening and control method based on natural language queries using vector information composed including
In claim 1, A step of generating scene vector embedding data by vectorizing individual frame images from CCTV footage using an image embedding neural network model; A step of storing the above scene vector embedding data in a vector database (410); A step of calculating vector similarity between the scene vector embedding data and the natural language vector embedding data stored in the vector database (410); and A step of outputting scene vector embedding data, in which the vector similarity is greater than or equal to a preset threshold, as a selective control image for the natural language query; An intelligent screening and control method based on natural language queries using vector information configured to include more.
In claim 1, An intelligent screening and control method based on natural language queries using vector information, characterized by being configured to generate object vector embedding data by inputting each object image detected in CCTV footage into an image embedding neural network model corresponding to the object class and vectorizing it.
In claim 1 or claim 2, An intelligent screening and control method based on natural language queries using vector information, characterized in that the above vector similarity calculation is performed by vector dot product.
A computer program stored on a computer-readable storage medium to execute a natural language query-based intelligent screening and control method using vector information according to any one of claims 1 to 3 on a computer.

Description

Intelligent selective surveillance method based on natural language queries using vectorized information The present invention generally relates to intelligent selective monitoring technology that analyzes CCTV footage in a video monitoring system using artificial intelligence. In particular, the present invention relates to an intelligent selective monitoring technology based on natural language queries using vectorization, configured to vectorize object images and frame images of CCTV footage using an AI image embedding neural network model and store them in a database, and to vectorize a user's natural language query regarding a target object or situation through an AI natural language embedding model and then search the database for object images or frame images that match the natural language query based on vector similarity. Recently, it has become common to establish CCTV-based video surveillance systems for crime prevention, safety accident prevention, and securing post-incident evidence. [Fig. 1] is a general configuration diagram of a CCTV video surveillance system. Referring to [Fig. 1], the CCTV video surveillance system is equipped with a plurality of CCTV cameras (10), a plurality of client devices (50), a video surveillance device (100), a storage device (200), and a video analysis device (300). CCTV cameras (10) are installed at multiple locations and provide footage of the respective locations to the video control device (100) in real time. The video control device (100) provides the CCTV footage to the controller for monitoring, and the storage device (200) stores the CCTV footage temporarily or for a long period for future verification. The video control device (100) transmits the CCTV footage to the video analysis device (300) to instruct video analysis. The video analysis device (300) analyzes the CCTV footage in real time or retrospectively. The controller checks the CCTV footage on the client device (50) and performs various operations for video control. Traditionally, the method involves control center operators monitoring CCTV footage to assess the situation. The drawbacks of this approach include a higher likelihood of errors when operators lack proficiency and a severe shortage of personnel relative to the number of cameras. To address these shortcomings, there are active attempts to analyze CCTV footage using neural networks. By inputting CCTV video into a pre-trained neural network model, functions such as object detection, identification of anomalies, object search, and object tracking are performed. [Fig. 2] is a general configuration diagram of a neural network-based intelligent screening control system, and [Fig. 3] is a general conceptual diagram of an intelligent screening control process performed by a screening control device (400). Referring to [Fig. 2], the intelligent screening control system is equipped with a plurality of CCTV cameras (10), a plurality of client devices (50), a video control device (100), a storage device (200), a video analysis device (300), and a screening control device (400). The screening control device (400) utilizes a neural network model (NNM) to perform object detection and object grouping that recognize objects of interest in CCTV footage, attribute recognition and search by attribute that extract and utilize features of objects, action recognition that recognizes human actions within the video, and re-identification of the same person that searches for the same person in multiple CCTV footages. The selective control device (400) performs an object detection process by inputting a series of frame images obtained from CCTV video into an object detection neural network model to extract multiple objects of interest. Then, it performs an object grouping process by inputting these objects of interest into an object tracker to find and group identical objects in a series of frames of CCTV video, thereby forming multiple groups of objects of interest. The object tracker identifies identical objects from a series of frames of CCTV video in relation to multiple objects of interest and groups them into respective groups. Then, the selective control device (400) performs object attribute recognition, behavior recognition, and identical person re-identification operations on the groups of objects of interest based on a neural network. These attribute recognition, behavior recognition, and identical person re-identification operations are distributed among neural network models suitable for their respective purposes (object detection neural network model, attribute recognition neural network model, pose estimation neural network model, behavior recognition neural network model, Re-ID neural network model, etc.) to perform the inference process. To achieve the purpose of video surveillance, the objects or situations that an intelligent selective surveillance system must detect in CCTV footage vary depending on the location, viewing angle, situation, tim