Search

US-20260127860-A1 - OBJECT CLASSIFICATION MODEL TRAINING METHOD, COMPUTATION DEVICE FOR TRAINING OBJECT CLASSIFICATION MODEL, OBJECT RECOGNITION DEVICE, AND OBJECT RECOGNITION METHOD

US20260127860A1US 20260127860 A1US20260127860 A1US 20260127860A1US-20260127860-A1

Abstract

An object classification model training method, a computation device for training an object classification model, an object recognition device, and an object recognition method are provided. The object classification model training method comprises steps of: receiving multiple model files each respectively having a virtual object 3D model; generating multiple 2D images of each virtual object 3D model by a revolving image-capture schedule, wherein the positions of the virtual object 3D model in the multiple 2D images are different; generating a training data set for each virtual object 3D model based on the multiple 2D images of each virtual object 3D model; and training an untrained model according to the training data sets corresponding to the virtual object 3D models of the multiple model files to obtain an object classification model.

Inventors

  • Ching Han Yang
  • Min Di
  • Li Wei Kuo
  • Wei-Jen Wang
  • Chia-Yu Lin

Assignees

  • INSTITUTE FOR INFORMATION INDUSTRY

Dates

Publication Date
20260507
Application Date
20250103
Priority Date
20241104

Claims (16)

  1. 1 . An object classification model training method performed by a computation device and comprising steps as follows: reading multiple model files, wherein each model file comprises a virtual object three-dimensional (3D) model; generating multiple two-dimensional (2D) images of the virtual object 3D model of each model file by a revolving image-capture schedule, wherein the multiple 2D images respectively have different contents corresponding to different positions of the virtual object 3D model of each model file; generating a training dataset according to the multiple 2D images corresponding to the virtual object 3D model of each model file; and training a to-be-trained model by the training dataset corresponding to the virtual object 3D model of each model file to obtain an object classification model.
  2. 2 . The training method as claimed in claim 1 , wherein image capturing configuration data of the revolving image-capture schedule comprises at least one object fixed axis, at least one axis of revolution, and at least one image capturing frequency.
  3. 3 . The training method as claimed in claim 1 , wherein the step of generating the training dataset comprises steps as follows: computing a ratio of a size of a virtual object in each 2D image to an image size of the 2D image; sorting the multiple 2D images according to the ratios of the multiple 2D images; and generating the training dataset by the 2D image with the ratio higher than or equal to a threshold, and by the 2D image with the ratio lower than the threshold and complying with a filter condition, to exclude the 2D image without significant features for each model file.
  4. 4 . The training method as claimed in claim 1 , wherein in the step of reading the multiple model files, each model file comprises size information of the virtual object 3D model; and the step of generating the multiple 2D images of the virtual object 3D model of each model file by the revolving image-capture schedule comprises steps as follows: determining whether the size information of the virtual object 3D model is greater than or equal to a threshold size; if NO, storing a first preliminary image directly obtained by the revolving image-capture schedule as one of the multiple 2D images of the virtual object 3D model; and if YES, recognizing at least one feature of the virtual object 3D model by a deep learning model and storing a second preliminary image of the at least one feature obtained by the revolving image-capture schedule as one of the multiple 2D images of the virtual object 3D model; wherein a content of the first preliminary image is a whole virtual object, and a content of the second preliminary image is a portion that corresponds to the at least one feature of a virtual object.
  5. 5 . A computation device for training an object classification model comprising: a storage storing multiple model files, wherein each model file comprises a virtual object three-dimensional (3D) model; and a processor electrically connected to the storage and performing steps as follows: reading the multiple model files from the storage; generating multiple two-dimensional (2D) images of the virtual object 3D model of each model file by a revolving image-capture schedule, wherein the multiple 2D images respectively have different contents corresponding to different positions of the virtual object 3D model of each model file; generating a training dataset according to the multiple 2D images corresponding to the virtual object 3D model of each model file; and training a to-be-trained model by the training dataset corresponding to the virtual object 3D model of each model file to obtain an object classification model.
  6. 6 . The computation device as claimed in claim 5 , wherein image capturing configuration data of the revolving image-capture schedule comprises at least one object fixed axis, at least one axis of revolution, and at least one image capturing frequency.
  7. 7 . The computation device as claimed in claim 5 , wherein the step of generating the training dataset by the processor comprises steps as follows: computing a ratio of a size of a virtual object in each 2D image to an image size of the 2D image; sorting the multiple 2D images according to the ratios of the multiple 2D images; and generating the training dataset by the 2D image with the ratio higher than or equal to a threshold, and by the 2D image with the ratio lower than the threshold and complying with a filter condition, to exclude the 2D image without significant features for each model file.
  8. 8 . The computation device as claimed in claim 5 , wherein in the step of reading the multiple model files by the processor, each model file comprises size information of the virtual object 3D model; and the step of generating the multiple 2D images of the virtual object 3D model of each model file by the revolving image-capture schedule by the processor comprises steps as follows: determining whether the size information of the virtual object 3D model is greater than or equal to a threshold size; if NO, storing a first preliminary image directly obtained by the revolving image-capture schedule as one of the multiple 2D images of the virtual object 3D model; and if YES, recognizing at least one feature of the virtual object 3D model by a deep learning model and storing a second preliminary image of the at least one feature obtained by the revolving image-capture schedule as one of the multiple 2D images of the virtual object 3D model; wherein a content of the first preliminary image is a whole virtual object, and a content of the second preliminary image is a portion that corresponds to the at least one feature of a virtual object.
  9. 9 . An object recognition device comprising: an image capturing apparatus photographing a to-be-recognized object to generate an actual image having the to-be-recognized object; a monitor; and a processor signally connected to the image capturing apparatus and the monitor and performing steps as follows: receiving the actual image from the image capturing apparatus; inputting the actual image to the object classification model as claimed in claim 1 for the object classification model to output multiple object candidates according to the to-be-recognized object of the actual image, wherein each object candidate has an accuracy; sorting the multiple object candidates according to the accuracies to generate a sort list of the multiple object candidates; and controlling the monitor to display at least one of the object candidates that has the accuracy higher than a preset ratio in the sort list.
  10. 10 . The device as claimed in claim 9 , wherein each object candidate has a reference length; the image capturing apparatus comprises: a color camera generating the actual image; and a depth sensor sensing the to-be-recognized object to generate a depth image including the to-be-recognized object; the processor computes an estimated maximum length of the to-be-recognized object according to the actual image and the depth image; and the processor rearranges the multiple object candidates in the sort list according to differences among the reference lengths of the object candidates and the estimated maximum length, and controls the monitor to display top N of the object candidates with the accuracy higher than the preset ratio in the rearranged sort list, wherein N is a positive integer higher than or equal to 1.
  11. 11 . The device as claimed in claim 10 , wherein the multiple object candidates in the sort list are in a descending order according to the accuracies; the processor rearranging the multiple object candidates in the sort list is to determine whether an absolute difference value between the reference length of each object candidate and the estimated maximum length is lower than or equal to an error upper limit; and if not so, the processor arranges such object candidate to a bottom of the sort list.
  12. 12 . The device as claimed in claim 10 , wherein the step of computing the estimated maximum length by the processor comprises steps as follows: determining whether the actual image has a human-skin feature; when the processor determines the actual image does not have the human-skin feature, the processor performs steps as follows: recognizing the to-be-recognized object in the depth image to generate a first bounding box for the to-be-recognized object; and defining a longest side length of the first bounding box as the estimated maximum length according to depth information of the depth image; when the processor determines the actual image has the human-skin feature, the processor performs steps as follows: converting the actual image to a first binary image comprising a human-skin area and a non-human-skin area; converting the depth image to a second binary image, comprising the to-be-recognized object, a human-skin area, and a background area, according to the depth information of the depth image; performing an image coordinate transformation for the first binary image and the second binary image to have consistent coordinate systems; comparing image contents of the first binary image and the second binary image to recognize the to-be-recognized object in the second binary image to generate a second bounding box for the to-be-recognized object; and defining a longest side length of the second bounding box as the estimated maximum length according to the depth information of the depth image.
  13. 13 . An object recognition method performed by a processor and comprising steps as follows: receiving an actual image from an image capturing apparatus; inputting the actual image to the object classification model as claimed in claim 1 for the object classification model to output multiple object candidates according to the to-be-recognized object of the actual image, wherein each object candidate has an accuracy; sorting the multiple object candidates according to the accuracies to generate a sort list of the multiple object candidates; and controlling a monitor to display at least one of the object candidates that has the accuracy higher than a preset ratio in the sort list.
  14. 14 . The method as claimed in claim 13 further comprising steps as follows: receiving a depth image from the image capturing apparatus; computing an estimated maximum length of the to-be-recognized object according to the actual image and the depth image; and rearranging the multiple object candidates in the sort list according to differences among reference lengths of the object candidates and the estimated maximum length, and controlling the monitor to display a preset number of the object candidates with the accuracy higher than the preset ratio in the rearranged sort list.
  15. 15 . The method as claimed in claim 14 , wherein the multiple object candidates in the sort list are in a descending order according to the accuracies; the processor rearranging the multiple object candidates in the sort list is to determine whether an absolute difference value between the reference length of each object candidate and the estimated maximum length is lower than or equal to an error upper limit; and if not so, the processor arranges such object candidate to a bottom of the sort list.
  16. 16 . The method as claimed in claim 14 , wherein the step of computing the estimated maximum length by the processor comprises steps as follows: determining whether the actual image has a human-skin feature; when the processor determines the actual image does not have the human-skin feature, the processor performs steps as follows: recognizing the to-be-recognized object in the depth image to generate a first bounding box for the to-be-recognized object; and defining a longest side length of the first bounding box as the estimated maximum length according to depth information of the depth image; when the processor determines the actual image has the human-skin feature, the processor performs steps as follows: converting the actual image to a first binary image comprising a human-skin area and a non-human-skin area; converting the depth image to a second binary image, comprising the to-be-recognized object, a human-skin area, and a background area, according to the depth information of the depth image; performing an image coordinate transformation for the first binary image and the second binary image to have consistent coordinate systems; comparing image contents of the first binary image and the second binary image to recognize the to-be-recognized object in the second binary image to generate a second bounding box for the to-be-recognized object; and defining a longest side length of the second bounding box as the estimated maximum length according to the depth information of the depth image.

Description

CROSS REFERENCE TO RELATED APPLICATIONS The present application claims priority to Taiwan application No. 113142135, filed on Nov. 4, 2024, the content of which is hereby incorporated by reference in its entirety. BACKGROUND OF THE INVENTION 1. Field of the Invention The present application relates generally to a model training method, a computation device for model training, an object recognition device, and an object recognition method, and more particularly to an object classification model training method, a computation device for training an object classification model, an object recognition device using an object classification model, and an object recognition method thereof. 2. Description of Related Art A machine is usually assembled by multiple components. For example, the machine with a relatively complicated mechanism may be assembled by hundreds of components. Hard copies of data, such as subassembly drawings of the machine, assembly drawings of the machine, diagrams of the components, specification datasheets, and so on, are provided at the site of the machine assembly. The engineer can check and review the hard copies for reference during the assembly process. However, when the engineer would like to search for the specification datasheet of a certain component or to confirm the mounting position of a certain component, the engineer has to spend a lot of time comparing the information among the hard copies during the assembly process. That would be quite bothering for the engineer. SUMMARY OF THE INVENTION The objectives of the present invention are to provide an object classification model training method, a computation device for training an object classification model, an object recognition device using an object classification model, and an object recognition method thereof for resolving the trouble of checking and reviewing data from hard copies as described in the related art. The object classification model training method of the present invention is performed by a computation device and comprises steps as follows: reading multiple model files, wherein each model file comprises a virtual object three-dimensional (3D) model; generating multiple two-dimensional (2D) images of the virtual object 3D model of each model file by a revolving image-capture schedule, wherein the multiple 2D images respectively have different contents corresponding to different positions of the virtual object 3D model of each model file; generating a training dataset according to the multiple 2D images corresponding to the virtual object 3D model of each model file; and training a to-be-trained model by the training dataset corresponding to the virtual object 3D model of each model file to obtain an object classification model. The computation device for training an object classification model of the present invention comprises a storage and a processor. The storage stores multiple model files, wherein each model file comprises a virtual object three-dimensional (3D) model. The processor is electrically connected to the storage and performing steps as follows: reading the multiple model files from the storage; generating multiple two-dimensional (2D) images of the virtual object 3D model of each model file by a revolving image-capture schedule, wherein the multiple 2D images respectively have different contents corresponding to different positions of the virtual object 3D model of each model file; generating a training dataset according to the multiple 2D images corresponding to the virtual object 3D model of each model file; and training a to-be-trained model by the training dataset corresponding to the virtual object 3D model of each model file to obtain an object classification model. The object recognition device of the present invention comprises an image capturing apparatus, a monitor, and a processor. The image capturing apparatus photographs a to-be-recognized object to generate an actual image having the to-be-recognized object. The processor is signally connected to the image capturing apparatus and the monitor and performs steps as follows: receiving the actual image from the image capturing apparatus; inputting the actual image to the foregoing object classification model for the object classification model to output multiple object candidates according to the to-be-recognized object of the actual image, wherein each object candidate has an accuracy; sorting the multiple object candidates according to the accuracies to generate a sort list of the multiple object candidates; and controlling the monitor to display at least one of the object candidates that has the accuracy higher than a preset ratio in the sort list. The object recognition method of the present invention is performed by a processor and comprises steps as follows: receiving an actual image from an image capturing apparatus; inputting the actual image to the foregoing object classification model for the object classification model to outp