US-12626396-B2 - Electronic device and controlling method of electronic device

US12626396B2US 12626396 B2US12626396 B2US 12626396B2US-12626396-B2

Abstract

An electronic device and a control method of an electronic device are provided. The method acquiring a plurality of images through at least one camera, inputting red green blue (RGB) data for each of the plurality of images into a first neural network model to obtain two-dimensional pose information on an object included in the plurality of images, inputting RGB data for at least one image of the plurality of images into a second neural network model to identify whether the object is transparent, if the object is a transparent object, performing stereo matching based on the two-dimensional pose information on each of the plurality of images to obtain three-dimensional pose information on the object, and if the object is an opaque object, acquiring three-dimensional pose information on the object based on one image of the plurality of images and depth information corresponding to the one image.

Inventors

Jaesik CHANG
MINJU KIM
Heungwoo HAN

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20230331
Priority Date: 20201106

Claims (14)

1 . An electronic device comprising: at least one camera; memory storing instructions; and at least one processor communicatively coupled to the at least one camera and the memory, wherein the instructions, when executed by the at least one processor individually or collectively, cause the electronic device to: acquire plurality of images through the at least one camera, input red green blue (RGB) data for each of the plurality of images into a first neural network model to acquire two-dimensional pose information on an object included in the plurality of images, input RGB data for at least one image of the plurality of images into a second neural network model to identify a state of the object as a transparent object or an opaque object, based on the object being the transparent object, perform stereo matching based on the two-dimensional pose information on each of the plurality of images to acquire three-dimensional pose information on the object, based on the object being the opaque object, acquire three-dimensional pose information on the object based on one image of the plurality of images and depth information corresponding to the one image, acquire information on whether the object is symmetrical through the second neural network model, identify whether the object is symmetrical based on the information on whether the object is symmetrical, based on the object being an object having symmetry, convert first feature points included in the two-dimensional pose information into second feature points unrelated to symmetry, and acquire three-dimensional pose information for the object by performing the stereo matching based on the second feature points.
2 . The electronic device of claim 1 , wherein the instructions, when executed by the at least one processor individually or collectively, further cause the electronic device to: acquire information about transparency of the object through the second neural network model, and identify transparency of the object based on the information about the transparency of the object.
3 . The electronic device of claim 1 , wherein, based on the object being an object having symmetry, the first feature points are identified based on a three-dimensional coordinate system in which x-axis or y-axis is perpendicular to the at least one camera.
4 . The electronic device of claim 1 , wherein the plurality of images are two images acquired at two different points in time through a first camera among the at least one camera.
5 . The electronic device of claim 4 , wherein the plurality of images are two images acquired at same points in time through each of the first camera and a second camera among the at least one camera.
6 . The electronic device of claim 5 , wherein the instructions, when executed by the at least one processor individually or collectively, further cause the electronic device to: acquire first location information about a positional relationship between the first camera and the second camera, and perform the stereo matching based on the two-dimensional pose information for each of the plurality of images and the first location information.
7 . The electronic device of claim 6 , further comprising: a driver, wherein the instructions, when executed by the at least one processor individually or collectively, further cause the electronic device to: control the driver to change a position of at least one of the first camera and the second camera, acquire second position information about a positional relationship between the first camera and the second camera based on the changed position of the at least one camera, and perform the stereo matching based on the two-dimensional pose information for each of the plurality of images and second location information.
8 . The electronic device of claim 1 , wherein the first neural network model and the second neural network model are included in one integrated neural network model.
9 . The electronic device of claim 1 , wherein the two-dimensional pose information includes a bounding box corresponding to each object included in an image through a first neural network model.
10 . A method performed by an electronic device, the method comprising: acquiring, by the electronic device, plurality of images through at least one camera; inputting, by the electronic device, red green blue (RGB) data for each of the plurality of images into a first neural network model to acquire two-dimensional pose information on an object included in the plurality of images; inputting, by the electronic device, RGB data for at least one image of the plurality of images into a second neural network model to identify a state of the object as a transparent object or an opaque object; based on the object being the transparent object, performing, by the electronic device, stereo matching based on the two-dimensional pose information on each of the plurality of images to acquire three-dimensional pose information on the object; and based on the object being the opaque object, acquiring, by the electronic device, three-dimensional pose information on the object based on one image of the plurality of images and depth information corresponding to the one image, acquiring information on whether the object is symmetrical through the second neural network model; and identifying whether the object is symmetrical based on the information on whether the object is symmetrical, wherein the method of the electronic device further comprises: based on the object being an object having symmetry, converting first feature points included in the two-dimensional pose information into second feature points unrelated to symmetry, and acquiring three-dimensional pose information for the object by performing the stereo matching based on the second feature points.
11 . The method of claim 10 , wherein identifying transparency of the object comprises: acquiring information about transparency of the object through the second neural network model; and identifying transparency of the object based on the information about the transparency of the object.
12 . The method of claim 10 , wherein, based on the object being an object having symmetry, the first feature points are identified based on a three-dimensional coordinate system in which x-axis or y-axis is perpendicular to the at least one camera.
13 . The method of claim 10 , wherein the plurality of images are two images acquired at two different points in time through a first camera among the at least one camera.
14 . The method of claim 13 , wherein the plurality of images are two images acquired at same points in time through each of the first camera and a second camera among the at least one camera.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S) This application is a continuation application, claiming priority under § 365(c), of an International application No. PCT/KR2021/016060, filed on Nov. 5, 2021, which is based on and claims the benefit of a Korean patent application number 10-2020-0147389, filed on Nov. 6, 2020, in the Korean Intellectual Property Office, and of a Korean patent application number 10-2021-0026660, filed on Feb. 26, 2021, in the Korean Intellectual Property Office, the disclosure of each of which is incorporated by reference herein in its entirety. BACKGROUND 1. Field The disclosure relates to an electronic device and a controlling method of the electronic device. More particularly, the disclosure relates to a device capable of acquiring three-dimensional (3D) pose information of an object included in an image. 2. Description of Related Art A need for a technology for acquiring three-dimensional (3D) pose information on an object included in an image is highlighted recently. More particularly, development of technology for detecting an object included in an image and using 3D pose information for the detected object by using a neural network model, such as a convolutional neural network (CNN) has been accelerated recently. However, when pose information on an object is acquired based on one image according to the prior art, it is difficult to acquire pose information of an object for which a 3D model has not been established, and particularly, it is difficult to acquire accurate pose information for a transparent object. In addition, when pose information on an object is acquired based on a stereo camera according to the related art, there are limitations in that a range of distances that may be measured for acquisition of pose information is limited due to a narrow field of view difference between the two cameras, and when the positional relationship between the two cameras is changed, a trained neural network model may not be used with the premise that the positional relationship between the two cameras is fixed, or the like. The above information is presented as background information only to assist with an understanding of the disclosure. No determination has been made, and no assertion is made, as to whether any of the above might be applicable as prior art with regard to the disclosure. SUMMARY Aspects of the disclosure is to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the disclosure is to provide an electronic device capable of acquiring 3D pose information for an object in an efficient manner according to the features of an object included in an image, and a method for controlling the electronic device. Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments. In accordance with an aspect of the disclosure, an electronic device is provided. The electronic device includes at least one camera, a memory, and a processor configured to acquire plurality of images through the least one camera, input red green blue (RGB) data for each of the plurality of images into a first neural network model to acquire two-dimensional pose information on an object included in the plurality of images, input RGB data for at least one image of the plurality of images into a second neural network model to identify whether the object is transparent, based on the object being a transparent object, perform stereo matching based on the two-dimensional pose information on each of the plurality of images to acquire three-dimensional pose information on the object, and based on the object being an opaque object, acquire three-dimensional pose information on the object based on one image of the plurality of images and depth information corresponding to the one image. The processor may acquire information about transparency of the object through the second neural network model, and identify transparency of the object based on the information about the transparency of the object. The processor may acquire information on whether the object is symmetrical through the second neural network model, identify whether the object is symmetrical based on the information on whether the object is symmetrical, based on the object being an object having symmetry, convert first feature points included in the two-dimensional pose information into second feature points unrelated to symmetry, and acquire three-dimensional pose information for the object by performing the stereo matching based on the second feature points. Based on the object being an object having symmetry, the first feature points are identified based on a three-dimensional coordinate system in which x-axis or y-axis is perpendicular to the at least one camera. The plurality of images are two images acquired at two different poi