US-12626485-B2 - Apparatus and method for recognizing object

US12626485B2US 12626485 B2US12626485 B2US 12626485B2US-12626485-B2

Abstract

Provided is an apparatus for recognizing an object that includes an object inference module configured to process an original image captured by a camera module and generate an image of a size to be input to a machine learning inference model, wherein the object inference module includes the machine learning inference model, and outputs a result of recognition and classification of an object inferred through the machine learning inference model, and the machine learning inference model processes an input image to infer an object included in the input image.

Inventors

Geon Min Yeo
Young Il Kim
Seong Hee PARK
Wun Cheol Jeong
Tae Wook Heo

Assignees

ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Dates

Publication Date: 20260512
Application Date: 20221213
Priority Date: 20211214

Claims (20)

1 . An apparatus for recognizing an object, the apparatus comprising: at least one processor; and a memory storing instructions, which, when executed by the at least one processor, cause the at least one processor to implement: an object inference module configured to process an original image captured by a camera module and generate an image of a size to be input to a machine learning inference model, wherein the object inference module includes the machine learning inference model, and outputs a result of recognition and classification of an object inferred through the machine learning inference model, and wherein the object inference module is configured to, prior to processing by the machine learning inference model: generate a reduced binary image from the original image; calculate a background line representing a boundary between a ground region and a sky space based on pixel values of the reduced binary image; and extract at least one object-containing region from the sky space based on the background line, wherein the machine learning inference model processes the extracted object-containing region to infer the object included in the extracted object-containing region.
2 . The apparatus of claim 1 , wherein the object inference module is configured to extract, as the image to be input to the machine learning inference model, an image including a plurality of object-containing regions clustered from the original image according to the size to be input to the machine learning inference model.
3 . The apparatus of claim 2 , wherein the object interference module is configured to: in order to extract the object-containing region, calculate the background line from the reduced binary image obtained by reducing the original image; add an offset toward the sky space above the background line to calculate a boundary line; and extract the object-containing region only from the sky space except for excluding a region below the boundary line.
4 . The apparatus of claim 3 , wherein the object inference module is configured to: convert a color image, which is the original image, into a gray image in which R, G, and B values are the same; and reduce the gray image to generate a reduced binary image, wherein the binary image is an image in which a pixel value greater than or equal to a designated threshold value is represented as 1 and a pixel value smaller than the designated threshold value is represented as 0.
5 . The apparatus of claim 3 , wherein the object inference module is configured to use different threshold values to calculate N background lines, and obtain a weighting average of the N background lines assigned different weights (W k ) to calculate a final background line (B i ).
6 . The apparatus of claim 3 , wherein the object inference module is configured to: calculate an object line for detecting an object in the reduced binary image; and repeatedly perform a process of searching for a first vertical pixel index (OBJ i,O ) at which a vertical pixel value with respect to a horizontal pixel (i) of the binary image is minimum based on an object line calculation threshold (THRESHOLD_O) designated to calculate the object line so as to detect the object line.
7 . The apparatus of claim 6 , wherein the object inference module is configured to independently detect objects in a plurality of segmented regions obtained by dividing the reduced binary image in a horizontal direction, wherein a point at which a gradient of the object line maximally increases beyond an object detection threshold (THRESHOLD_OBJECT) is detected as a location of an object.
8 . The apparatus of claim 7 , wherein the object inference module is configured to, upon the locations of the objects being detected in the reduced binary image, cluster object-containing regions according to the size to be input to the machine learning inference model, wherein the clustering is performed with a smallest number of combinations of the object-containing regions in the corresponding image.
9 . The apparatus of claim 8 , wherein the object inference module is configured to apply a specified ratio to the object-containing region clustered in the reduced binary image so as to map the clustered object-containing region to an object-containing region of the original image.
10 . A method of recognizing an object, the method comprising: receiving, by an object inference module, an original image captured by a camera module; processing, by the object inference module, the original image, the processing comprising: generating a reduced binary image from the original image, calculating a background line representing a boundary between a ground region and a sky space based on pixel values of the reduced binary image, and extracting at least one object-containing region from the sky space based on the background line; and processing, by a machine learning inference model, the extracted object-containing region to perform recognition and classification of an object included in the extracted object-containing region, and outputting a result of the recognition and classification of the object.
11 . The method of claim 10 , wherein the extracting of the at least one object-containing region comprises: detecting, by the object inference module, at least one object based on pixel values of the reduced binary image prior to processing by the machine learning inference model; clustering, by the object inference module, at least one object-containing region based on locations at which the at least one object is detected; and mapping, by the object inference module, the object-containing region clustered in the reduced binary image to corresponding object-containing regions in the original image, to obtain the at least one object-containing region that is to be input to the machine learning inference model.
12 . The method of claim 11 , wherein the detecting of the at least one object based on pixel values of the reduced binary image comprises: calculating, by the object interference module, the background line from the reduced binary image; adding an offset toward the sky space above the background line to calculate a boundary line; and detecting the at least one object only from the sky space above the boundary line.
13 . The method of claim 11 , wherein the detecting of the at least one object further comprises: dividing the reduced binary image into a plurality of segmented regions in a horizontal direction; and independently detecting objects from the plurality of segmented regions, thereby reducing an overall inference time.
14 . A method of recognizing an object, the method comprising: reducing, by an object inference module, an original image to generate a reduced image; converting, by the object inference module, the reduced image into a binary image to generate a reduced binary image; detecting, by the object inference module, an object in the reduced binary image based on pixel values of the reduced binary image prior to processing by a machine learning inference model, and detecting a location of the object based on the reduced binary image; clustering, by the object inference module, object-containing regions in the reduced binary image based on the detected location of the object; mapping, by the object inference module, the object-containing regions clustered in the reduced binary image to corresponding object-containing regions in the original image; and inputting, by the object inference module, the object-containing regions mapped to the original image to the machine learning inference model for object recognition.
15 . The method of claim 14 , wherein the detecting of the object comprises: calculating, by the object interference module, a background line from the reduced binary image; adding an offset toward a sky space above the background line to calculate a boundary line; and detecting the object only from the sky space above the boundary line.
16 . The method of claim 14 , wherein the generating of the reduced binary image includes: converting, by the object inference module, a color image, which is the original image, into a gray image in which R, G, and B values are the same, and reducing the gray image to generate a reduced image; and representing, by the object inference module, 0 when a pixel value is greater than or equal to a designated threshold value and representing 1 when a pixel value is smaller than the designated threshold value, to generate the reduced binary image.
17 . The method of claim 15 , wherein the calculating of the background line comprises: using, by the object inference module, different threshold values to calculate N background lines; and obtaining a weighting average of the N background lines assigned different weights (W k ) to calculate a final background line (B i ).
18 . The method of claim 14 , wherein the detecting of the object and the detecting of the location of the object comprise: calculating, by the object inference module, an object line in the reduced binary image; and detecting, by the object inference module, a location of the object as a point at which a gradient of the object line maximally increases beyond an object detection threshold (THRESHOLD_OBJECT).
19 . The method of claim 18 , wherein the calculating of the object line comprises repeatedly performing, by the object inference module, a process of searching for a first vertical pixel index (OBJ i,O ) at which a vertical pixel value with respect to a horizontal pixel (i) of the reduced binary image is a minimum based on a designated object line calculation threshold (THRESHOLD_O) so as to detect the object line.
20 . The method of claim 14 , wherein the mapping of the object-containing regions clustered in the reduced binary image to the corresponding object-containing regions in the original image includes applying, by the object inference module, a designated ratio to the object-containing regions clustered in the reduced binary image to map the clustered object-containing regions to the object-containing regions of the original image.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priorities to and the benefits of Korean Patent Application No. 10-2021-0179118, filed on Dec. 14, 2021, and Korean Patent Application No. 10-2022-0172036, filed on Dec. 9, 2022, the disclosures of which are incorporated herein by reference in its entirety. BACKGROUND 1. Field of the Invention The present invention relates to an apparatus and method for recognizing an object, and more specifically, to an apparatus and method for recognizing an object that are capable of improving the accuracy and speed of inference in object recognition and classification. 2. Description of Related Art In general, machine learning is an algorithm used for a computer to perform learning and prediction based on data input by a user. In other words, machine learning is a technology of recognizing the hierarchical structure and certain patterns of related entities to internally judge and determine information that has not been input, and predicting situations that will occur in the future. In machine learning, learning is performed in several types, one of which is supervised learning, which is an algorithm for assigning a result value to each piece of data used for training to classify a model. In addition, another type of machine learning is unsupervised learning, which is an algorithm that searches for commonalities in training data that is not separately assigned result values and groups the training data. In addition, another type of machine learning is reinforcement learning, which is an algorithm for providing compensation according to actions taken in different situations without separately preparing training data. Machine learning is used in various fields, such as games, vehicles, robots, and the like. The background art of the present invention is disclosed in Korean Registered Patent No. 10-2261187 (registered on May 31, 2021, a system and method for machine-learning-based surveillance video analysis). SUMMARY OF THE INVENTION The present invention is directed to providing an apparatus and method for recognizing an object that are capable of improving accuracy and inference speed in object recognition and classification. The technical objectives of the present invention are not limited to the above, and other objectives may become apparent to those of ordinary skill in the art based on the following descriptions. According to an aspect of the present invention, there is provided an apparatus for recognizing an object that includes an object inference module configured to process an original image captured by a camera module and generate an image of a size to be input to a machine learning inference model, wherein the object inference module includes the machine learning inference model, and outputs a result of recognition and classification of an object inferred through the machine learning inference model, and the machine learning inference model processes an input image to infer an object included in the input image. The object inference module may extract, as an image to be input to the machine learning inference model, an image of object-containing regions clustered in the original image according to the size to be input to the machine learning inference model The object interference module may, in order to extract the object-containing region, calculate a background line from a binary image obtained by reducing the original image, add an offset toward a sky space above the background line to calculate a boundary line, and extract an object-containing region only from the sky space except for a region below the boundary line The object inference module may convert a color image, which is the original image, into a gray image in which R, G, and B values are the same, and reduce the gray image to generate a reduced binary image, wherein the binary image is an image in which a pixel value greater than or equal to a designated threshold value is represented as 1 and a pixel value smaller than the designated threshold value is represented as 0. The object inference module may use different threshold values to calculate N background lines, and obtain a weighting average of the N background lines assigned different weights (Wk) to calculate a final background line (Bi). The object inference module may calculate an object line for detecting an object in the reduced binary image, and repeatedly perform a process of searching for a first vertical pixel index (OBJi,O) at which a vertical pixel value with respect to a horizontal pixel (i) of the binary image is minimum based on an object line calculation threshold (THRESHOLD_O) designated to calculate the object line so as to detect the object line. The object inference module may independently detect objects in a plurality of segmented regions obtained by dividing the reduced binary image in a horizontal direction, wherein a point at which a gradient of the object line maximally increases beyond an object dete