JP-7854855-B2 - Object detection model learning device and object detection model learning method

JP7854855B2JP 7854855 B2JP7854855 B2JP 7854855B2JP-7854855-B2

Inventors

永吉洋登
會下拓実

Assignees

株式会社日立製作所

Dates

Publication Date: 20260507
Application Date: 20220523

Claims (7)

An object detection learning unit that trains an object detection model to detect objects from an input image, The system includes a learning data generation unit that generates learning data used for the aforementioned learning, The aforementioned training data generation unit is: The training data is generated by synthesizing images of the object to be detected using computer graphics. With respect to the aforementioned training data, a first training signal relating to the object to be detected is generated, With respect to the aforementioned training data, a second training signal corresponding to the state of the object to be detected at the time of imaging is generated. The aforementioned object detection model is A first inference unit outputs a first inference result regarding the object to be detected with respect to the input image, The system includes a second inference unit that outputs a second inference result regarding the state of the object to be detected at the time of imaging, based on the input image. The first inference unit obtains the first inference result using the input image and the second inference result. The object detection learning unit is A second error calculation unit calculates the difference between the second inference result obtained by providing the training data to the second inference unit and the second training signal as the second error, A first error calculation unit that provides the training data and the second inference result to the first inference unit and calculates the difference between the first inference result obtained and the first teacher signal as the first error, An object detection model learning device comprising: an inference parameter update unit that updates the parameters of the second inference unit based on the second error and updates the parameters of the first inference unit based on the first error.
The object detection model learning device according to claim 1, The aforementioned training data generation unit is: The constraints relating to the parameters of the computer graphics, the selection conditions for the first teacher signal, and the selection conditions for the second teacher signal are accepted. Within the scope of the aforementioned constraints, a plurality of computer graphics synthesis parameters are generated, and the training data is generated based on the computer graphics synthesis parameters. The first teacher signal is generated using the computer graphics synthesis parameters and the selection conditions for the first teacher signal. An object detection model learning device characterized by generating the second teacher signal using the computer graphics synthesis parameters and the selection conditions for the second teacher signal.
The object detection model learning device according to claim 1, An object detection model learning device characterized in that the first teacher signal includes the type of object to be detected and the position of the object to be detected in the training data.
The object detection model learning device according to claim 1, The object detection model learning device is characterized in that the second teacher signal includes at least one of the following: the positional relationship between the object to be detected and the camera, the state of illumination for the object to be detected, and the state of deformation of the object to be detected.
The object detection model learning device according to claim 4, The object detection model learning device is characterized in that the second teacher signal further includes the type of object to be detected and the position of the object to be detected in the training data.
The object detection model learning device according to claim 1, The object detection model learning device further comprises a shared inference unit, wherein the shared inference unit performs inference processing on the input image, and then the first inference unit and the second inference unit process the inference results of the shared inference unit.
An object detection learning method for training an object detection model that detects objects from an input image, The process involves generating training data by synthesizing images of the object to be detected using computer graphics, and The steps include generating a first training signal relating to the object to be detected with respect to the training data, The steps include generating a second training signal corresponding to the state of the object to be detected at the time of imaging using the aforementioned training data, The steps include providing the training data to the second inference unit of the object detection model and obtaining a second inference result regarding the state of the object to be detected at the time of imaging, The first inference unit of the object detection model is given the training data and the second inference result to obtain a first inference result regarding the object to be detected. The first step is to determine the difference between the first inference result and the first training signal as the first error, The steps include: determining the difference between the second inference result and the second training signal as the second error; The steps include updating the parameters of the second inference unit based on the second error and updating the parameters of the first inference unit based on the first error, A method for learning an object detection model, characterized by including the following:

Description

This invention relates to an object detection model learning device, an object detection device, and an object detection model learning method. Conventionally, computer graphics (CG) have been used to generate training data for object detection using image recognition. For example, Patent Document 1 states: "To provide a training data generation system that can acquire a large amount of training data necessary during the training process in a short time, in order to obtain a trained model used when performing object detection processing, pose detection processing, etc." It also states: "The training data generation system acquires a background image obtained by imaging a three-dimensional space. It also acquires CG object generation data, which is computer graphics processing data including at least one of the object's shape and texture. Based on the acquired CG object generation data, a CG object image is generated. A rendering image obtained by compositing the CG object image onto the background image so that the CG object is positioned at a predetermined location in three-dimensional space is acquired as a training image." Japanese Patent Publication No. 2020-119127 Figure 1 is a diagram showing the configuration of an object detection model learning device.Figure 2 is a diagram showing the configuration of the object detection device.Figure 3 is an explanatory diagram of the constraints on the CG synthesis parameters.Figure 4 shows a specific example of the first teacher signal selection condition.Figure 5 shows a specific example of the second teacher signal selection condition.Figure 6 is a flowchart showing the learning process.Figure 7 is a flowchart showing the object detection processing procedure.Figure 8 shows a specific example of a teacher signal.Figure 9 shows a specific example of a computer-generated image.Figure 10 is an explanatory diagram of the inference results.Figure 11 is a diagram showing the configuration of an object detection model learning device according to a modified example.Figure 12 is a diagram showing the configuration of an object detection device according to a modified example. The embodiments of the present invention will be described below with reference to the drawings. Note that the embodiments described below do not limit the invention as defined in the claims, and not all of the elements and combinations thereof described in the embodiments are necessarily essential to the solution of the invention. Furthermore, elements that are essential to the structure of the invention but are well-known may be omitted from the illustration and description. In the following explanation, the term "xxx table" may be used to describe the information obtained from a given input. This information can be any type of data structure. Therefore, "xxx table" can also be referred to as "xxx information." Furthermore, in the following explanation, the structure of each table is merely an example; one table may be divided into two or more tables, or all or part of two or more tables may constitute a single table. Furthermore, in the following explanation, the "program" may be used as the subject when describing processing. Since a program, executed by the processor unit, performs defined processing using the memory unit and/or interface unit as appropriate, the subject of the processing may be the processor unit (or a device such as a controller that possesses that processor unit). The program may be installed on a device such as a computer, or it may reside on a program distribution server or a computer-readable (e.g., non-temporary) recording medium. Furthermore, in the following description, two or more programs may be implemented as a single program, or one program may be implemented as two or more programs. Furthermore, the "processor unit" is one or more processors. Typically, a processor is a microprocessor such as a CPU (Central Processing Unit), but it may also be another type of processor, such as a GPU (Graphics Processing Unit). A processor may also be single-core or multi-core. Additionally, a processor may be a broader type of processor, such as a hardware circuit that performs some or all of the processing (e.g., an FPGA (Field-Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)). Furthermore, in the following explanation, when describing similar elements without distinction, a reference number (or a common reference number) may be used. When describing similar elements with distinction, the element's identification number (or reference number) may be used. Also, the number of elements shown in each figure is an example and is not limited to those shown.