KR-102963018-B1 - The Method and System That Detect Highly Reliable Events Using A Hybrid Artificial Neural Network Model

KR102963018B1KR 102963018 B1KR102963018 B1KR 102963018B1KR-102963018-B1

Abstract

The present invention relates to a high-reliability event detection method and system using a hybrid artificial neural network model, wherein the method detects an event object in an acquired image frame, generates a bounding box for the detected event object and tracks it, determines the validity of the event object based on spatial parameters derived from the bounding box, generates a detection image by merging the bounding box area and surrounding area of the valid event object, and detects whether an event has occurred based on the result output by inputting the detection image into an artificial neural network-based VLM.

Inventors

신요섭
이세훈
김민식

Assignees

(주)원모어시큐리티

Dates

Publication Date: 20260511
Application Date: 20250702

Claims (10)

A high-reliability event detection method using a hybrid artificial neural network model performed in a computing system comprising one or more processors and one or more memories, wherein An object detection step that detects event objects through an artificial neural network-based object detection model learned from acquired image frames; An object tracking step that creates a bounding box on the detected event object and performs tracking; For the above event object, an object validity determination step for verifying spatiotemporal consistency, wherein the object is determined to be valid when both a first condition, in which the change in bounding box size between N consecutive time points (N is a natural number) corresponds to a preset change criterion, and a second condition, in which the position coordinates of feature points within the bounding box correspond to a corresponding change criterion, are satisfied; A detection image generation step of generating a first peripheral area that expands by a first ratio set to be inversely proportional to the ratio occupied by the bounding box of an event object determined to be valid in the image frame, a second peripheral area that shrinks inward by applying a second ratio according to the first ratio to the first peripheral area, and a third peripheral area that expands outward, and merging each of these with the corresponding bounding box to generate a plurality of detection images of different sizes; and A high-reliability event detection method comprising: an event detection step of inputting the above-mentioned detection image into an artificial neural network-based VLM (Vision-Language Model) and detecting whether an event has occurred based on the result output from the VLM.
In claim 1, The above object validity determination step is, A high-reliability event detection method that derives spatial parameters including the size change between the bounding box of an event object detected in a video frame at the current time point and the bounding box of an event object detected in a video frame at the previous time point, and determines the validity of the event object based on whether the spatial parameters fall within a preset change criterion.
In claim 1, The above object validity determination step is, A high-reliability event detection method that derives spatial parameters including whether the position coordinates of feature points derived from each of the bounding boxes of event objects detected in video frames during a predetermined set of N consecutive previous time points (where N is a natural number) correspond to the bounding box of an event object detected in a video frame at the current time point, and determines the validity of said event object based on said spatial parameters.
In claim 1, The above object validity determination step determines the validity of each based on two or more different spatial parameters, and The above event detection step is, A high-reliability event detection method that inputs a detection image generated from the event object into the VLM when all validity judgment results for the above two or more different spatial parameters are determined to be valid.
delete
delete
delete
In claim 1, The above event detection step is, A plurality of detection images generated to have different sizes by merging the area of the bounding box and each of a plurality of surrounding areas set to have different sizes are each input into the VLM, and A high-reliability event detection method that, when an event occurrence is detected in all detection images as a result output from the above VLM, finally determines that an event occurred in the above image frame.
A computing system comprising one or more processors and one or more memories, and performing a high-reliability event detection method using a hybrid artificial neural network model, An object detection unit that detects event objects through an artificial neural network-based object detection model learned from acquired image frames; An object tracking unit that creates a bounding box on a detected event object and performs tracking; For the above event object, an object validity determination unit that determines validity when both a first condition, in which the change in bounding box size between N consecutive time points (N is a natural number) corresponds to a preset change standard, and a second condition, in which the position coordinates of feature points within the bounding box correspond, are satisfied in order to verify spatiotemporal consistency; A detection image generation unit that generates a first peripheral area that expands by a first ratio set to be inversely proportional to the ratio occupied by the bounding box of an event object determined to be valid in the image frame, a second peripheral area that shrinks inward by applying a second ratio according to the first ratio to the first peripheral area, and a third peripheral area that expands outward, and merges each of these with the corresponding bounding box to generate a plurality of detection images of different sizes; and A computing system comprising: an event detection unit that inputs the above-mentioned detection image into an artificial neural network-based VLM (Vision-Language Model) and detects whether an event has occurred based on the result output from the VLM.
A computer-readable recording medium for implementing a high-reliability event detection method using a hybrid artificial neural network model in a computing system comprising one or more processors and one or more memories, wherein the computer-readable recording medium comprises computer-executable instructions that cause the computing system to perform the following steps. The steps below are: An object detection step that detects event objects through an artificial neural network-based object detection model learned from acquired image frames; An object tracking step that creates a bounding box on the detected event object and performs tracking; For the above event object, an object validity determination step for verifying spatiotemporal consistency, wherein the object is determined to be valid when both a first condition, in which the change in bounding box size between N consecutive time points (N is a natural number) corresponds to a preset change criterion, and a second condition, in which the position coordinates of feature points within the bounding box correspond to a corresponding change criterion, are satisfied; A detection image generation step of generating a first peripheral area that expands by a first ratio set to be inversely proportional to the ratio occupied by the bounding box of an event object determined to be valid in the image frame, a second peripheral area that shrinks inward by applying a second ratio according to the first ratio to the first peripheral area, and a third peripheral area that expands outward, and merging each of these with the corresponding bounding box to generate a plurality of detection images of different sizes; and A computer-readable recording medium comprising: an event detection step of inputting the above-mentioned detection image into an artificial neural network-based VLM (Vision-Language Model) and detecting whether an event has occurred based on the result output from the VLM.

Description

The Method and System That Detect Highly Reliable Events Using A Hybrid Artificial Neural Network Model The present invention relates to a high-reliability event detection method and system using a hybrid artificial neural network model, wherein the method detects an event object in an acquired image frame, generates a bounding box for the detected event object and tracks it, determines the validity of the event object based on spatial parameters derived from the bounding box, generates a detection image by merging the bounding box area and surrounding area of the valid event object, and detects whether an event has occurred based on the result output by inputting the detection image into an artificial neural network-based VLM. Recently, object detection technology is being widely utilized in various fields such as video-based surveillance systems, autonomous driving, medical image analysis, and industrial safety. Object detection algorithms are computer vision technologies that automatically identify the location and type of specific objects within an image or video, performing a core function in enabling computers to understand and interpret visual information. In particular, when real-time event recognition is required, a method of detecting events in images using a One-Stage Detector-based object detection algorithm that simultaneously predicts the location and type of an object and is suitable for real-time processing is generally used. However, due to noise in dynamic object recognition, this method frequently results in false positives even when there is no actual event situation, leading to operational inefficiencies such as unnecessary resource input and wasted manpower. Conventional event detection technologies include a method and apparatus for improving the event detection performance of a deep learning model, as disclosed in Korean Patent Publication No. 10-2024-0061705. The invention improves event detection performance by relying on model performance, such as by performing event detection using a deep learning model trained in a different environment, labeling frames in which events are detected by applying them to a model that has been previously trained for false positive classification, and retraining the model based on the labeled results. However, these conventional event detection technologies can still cause false positives in event detection by incorrectly identifying non-event objects with similar appearances, such as flames, lighting, and reflections, as events. Therefore, it is necessary to develop technology capable of precisely determining whether an event situation exists. FIG. 1 schematically illustrates the components of a computing system that performs a high-reliability event detection method using a hybrid artificial neural network model according to one embodiment of the present invention. FIG. 2 schematically illustrates the execution steps of a high-reliability event detection method using a hybrid artificial neural network model according to an embodiment of the present invention. FIG. 3 schematically illustrates the process of tracking event objects detected in a plurality of image frames according to an embodiment of the present invention. FIG. 4 schematically illustrates spatial parameters including changes in the size of a bounding box according to one embodiment of the present invention. FIG. 5 schematically illustrates spatial parameters related to feature points of a bounding box according to one embodiment of the present invention. FIG. 6 schematically illustrates a process for determining the validity of a plurality of spatial parameters according to one embodiment of the present invention. FIG. 7 exemplarily illustrates a process for determining the validity of a plurality of spatial parameters according to an embodiment of the present invention. FIG. 8 schematically illustrates the process of performing a detection image generation step according to one embodiment of the present invention. FIG. 9 schematically illustrates the process of performing the correction position derivation step according to one embodiment of the present invention. FIG. 10 schematically illustrates the process of performing the final position derivation step according to one embodiment of the present invention. FIG. 11 schematically illustrates the process of generating a detection image according to another embodiment of the present invention. FIG. 12 illustrates, in an exemplary manner, the internal configuration of a computing device according to one embodiment of the present invention. Hereinafter, various embodiments and/or aspects are disclosed with reference to the drawings. For illustrative purposes, numerous specific details are disclosed in the following description to aid in a general understanding of one or more aspects. However, it will also be recognized by those skilled in the art that these aspects may be practiced without such specific details. The following descripti