KR-102964361-B1 - METHOD AND SYSTEM FOR GENERATING 3-DIMENSIONAL BOUNDING BOX

KR102964361B1KR 102964361 B1KR102964361 B1KR 102964361B1KR-102964361-B1

Abstract

The present disclosure relates to a method and system for generating a three-dimensional bounding box. A method for generating a three-dimensional bounding box performed by at least one processor comprises: generating a three-dimensional bounding box ground truth data set based on a first set of training images of a first domain style; generating a three-dimensional bounding box prediction data set within a second set of training images of a second domain style using an object recognition model; generating an error distribution between the three-dimensional bounding box ground truth data set and the three-dimensional bounding box prediction data set; acquiring a target image of the first domain style; generating three-dimensional bounding box target data for an object within the target image; injecting noise into the three-dimensional bounding box target data using the error distribution; and acquiring camera information associated with each of the second set of training images of the second domain style. Each ground truth three-dimensional bounding box included in the three-dimensional bounding box ground truth data set includes ground truth data for a plurality of items, and each predicted three-dimensional bounding box included in the three-dimensional bounding box prediction data set includes prediction data for a plurality of items, and an error The step of generating a distribution may include the step of generating a conditional error distribution between correct answer data and predicted data for each of the multiple items by considering the camera information.

Inventors

임호준
유희철

Assignees

주식회사 모라이

Dates

Publication Date: 20260513
Application Date: 20240807

Claims (10)

In a method for generating a three-dimensional bounding box performed by at least one processor, A step of generating a 3D bounding box correct answer dataset based on a first set of training images of a first domain style; A step of generating a 3D bounding box prediction dataset within a second set of training images of a second domain style using an object recognition model; A step of generating an error distribution between the above 3D bounding box correct answer data set and the above 3D bounding box prediction data set; A step of acquiring a target image of the first domain style above; A step of generating 3D bounding box target data for an object within the above target image; A step of injecting noise into the 3D bounding box target data using the above error distribution; and A step of obtaining camera information associated with each of the second set of training images of the second domain style. Includes, The above camera information includes at least one of camera external variable information or camera internal variable information, and Each correct answer 3D bounding box included in the above 3D bounding box correct answer data set contains correct answer data for multiple items, and Each predicted 3D bounding box included in the above 3D bounding box prediction dataset includes prediction data for the plurality of items, and The step of generating the above error distribution is, A step of generating projected height information of an object included in each of the second set of training images based on the camera information; A step of generating a conditional error distribution between the correct answer data and the prediction data for each of the plurality of items by considering the camera information; and Step of histogramming the conditional error distribution by dividing the camera information or the projected height information of the object by a preset number of intervals. Includes, The step of injecting the noise mentioned above is, A method for generating a 3D bounding box, comprising the step of injecting noise into the 3D bounding box target data using the histogrammed conditional error distribution based on camera information of the target image.
delete
delete
In paragraph 1, The above histogram-generating step is, A step of generating a cumulative probability distribution corresponding to the above error distribution; and A method for generating a three-dimensional bounding box, comprising the step of setting an interval to include the number of identical data elements among a plurality of data elements included in the error distribution based on the generated cumulative probability distribution.
In paragraph 1, The above histogram-generating step is, A method for generating a three-dimensional bounding box, comprising the step of setting a segment based on the distance between the camera and an object within the target image.
In paragraph 1, The step of injecting the noise mentioned above is, A step of calculating an occlusion region in which at least a portion of an object is obscured for an object included in the target image; and A method for generating a three-dimensional bounding box, comprising the step of injecting noise using weights corresponding to the calculated occlusion region and the error distribution.
In paragraph 6, The above occlusion region is, A method for generating a 3D bounding box, calculated based on the pixel value of an object included in the target image and the area value of a 3D bounding box for the object.
In paragraph 1, A step of outputting the noise-injected 3D bounding box target data as 3D bounding box data reflecting the uncertainty of the object recognition model. A method for generating a 3D bounding box that further includes
A computer program stored on a computer-readable recording medium for executing a method according to any one of paragraphs 1 and 4 through 8 on a computer.
As an information processing system, Communication module; Memory; and At least one processor connected to the memory and configured to execute at least one computer-readable program contained in the memory. Includes, The above at least one program is, A 3D bounding box ground truth dataset is generated based on a first set of training images of a first domain style, and Using an object recognition model, a 3D bounding box prediction dataset within a second set of training images of a second domain style is generated, and Generate an error distribution between the above 3D bounding box correct answer dataset and the above 3D bounding box prediction dataset, and Acquire the target image of the first domain style above, and Generate 3D bounding box target data for an object within the above target image, and Noise is injected into the 3D bounding box target data using the above error distribution, and It includes instructions for obtaining camera information associated with each of the second set of training images of the second domain style, and The above camera information includes at least one of camera external variable information or camera internal variable information, and Each correct answer 3D bounding box included in the above 3D bounding box correct answer data set contains correct answer data for multiple items, and Each predicted 3D bounding box included in the above 3D bounding box prediction dataset includes prediction data for the plurality of items, and Generating the above error distribution is, Based on the above camera information, projection height information of an object included in each of the second set of training images is generated, and Considering the camera information above, a conditional error distribution between the correct answer data and the prediction data is generated for each of the plurality of items, and It includes histogramming the conditional error distribution by dividing the camera information or the projection height information of the object by a preset number of intervals, and Injecting the above noise is, An information processing system comprising the step of injecting noise into the three-dimensional bounding box target data using the histogrammed conditional error distribution based on camera information of the target image.

Description

Method and System for Generating 3-Dimensional Bounding Box The present disclosure relates to a method and system for generating a three-dimensional bounding box, and specifically to a method and system for generating a three-dimensional bounding box with noise injected. 3D bounding boxes are generated to detect objects in an image and to predict their position and size in 3D space. Generally, 3D bounding boxes are created by recognizing objects using an object recognition model and then estimating the objects' 3D information. 3D bounding boxes play an important role in understanding the environment within an image and identifying the position, size, and movement of objects. Data associated with 3D bounding boxes is easily obtained from images captured in virtual reality. However, 3D bounding boxes generated from images captured in virtual reality are unsuitable for use in understanding real reality. This is because 3D bounding boxes generated from images associated with real reality are relatively noisier and less accurate than those generated from images associated with virtual reality. Furthermore, when developing decision/control logic using 3D bounding boxes generated from images associated with virtual reality, such logic is difficult to apply to images associated with real reality, resulting in a lack of realism. On the other hand, it is not easy to directly obtain 3D bounding boxes associated with real reality and the data associated with them so that they can be applied to real reality. Embodiments of the present disclosure will be described with reference to the accompanying drawings described below, wherein similar reference numerals indicate similar elements, but are not limited thereto. FIG. 1 is a schematic diagram showing an example of a three-dimensional bounding box generation system according to one embodiment of the present disclosure. FIG. 2 is a block diagram showing the internal configuration of a computing device according to one embodiment of the present disclosure. FIG. 3 is a drawing illustrating an example of generating a second learning image according to one embodiment of the present disclosure. FIG. 4 is a diagram illustrating an example of performing object recognition on a first training image according to one embodiment of the present disclosure. FIG. 5 is a drawing showing an example of determining a three-dimensional bounding box pair according to one embodiment of the present disclosure. FIG. 6 is a diagram showing an example of generating an error distribution according to one embodiment of the present disclosure. FIG. 7 is a drawing showing an example of a noise injection unit injecting noise according to one embodiment of the present disclosure. FIG. 8 is a flowchart illustrating an example of a method for generating a three-dimensional bounding box according to one embodiment of the present disclosure. FIG. 9 is a diagram illustrating a method for calculating a projection height considering camera performance according to one embodiment of the present disclosure. FIG. 10 is a diagram illustrating a method for generating an error distribution considering camera information according to one embodiment of the present disclosure. FIG. 11 is a diagram illustrating a method for segmenting an error distribution according to one embodiment of the present disclosure. FIG. 12 is a drawing illustrating an example in which a noise injection unit according to one embodiment of the present disclosure injects noise while considering an occlusion region. FIGS. 13 and 14 are drawings illustrating a method for a processor to calculate an occlusion region according to one embodiment of the present disclosure. FIG. 15 is a flowchart illustrating an example of a method for generating a three-dimensional bounding box according to another embodiment of the present disclosure. Hereinafter, specific details for implementing the present disclosure will be described in detail with reference to the attached drawings. However, in the following description, specific descriptions regarding widely known functions or configurations will be omitted if there is a risk that the gist of the present disclosure may be unnecessarily obscured. In the attached drawings, identical or corresponding components are assigned the same reference numerals. Additionally, in the description of the following embodiments, the description of identical or corresponding components may be omitted. However, even if a description of a component is omitted, it is not intended that such component is not included in any embodiment. The advantages and features of the disclosed embodiments and the methods for achieving them will become clear by referring to the embodiments described below in conjunction with the accompanying drawings. However, the present disclosure is not limited to the embodiments disclosed below but may be implemented in various different forms, and the embodiments provided are merely to make the present disclos