JP-2026074737-A - A method for selecting training data for training a deep learning model and a training data selection device using the same.

JP2026074737AJP 2026074737 AJP2026074737 AJP 2026074737AJP-2026074737-A

Abstract

[Problem] To provide a method for selecting training data for training a deep learning model. [Solution] The method includes the steps of: a learning data selection device generating at least one individual type corresponding to at least one object contained in each of a large number of learning images stored in a data pool, and generating a binary graph matching each of the large number of learning images with the individual type; and the learning data selection device referring to the binary graph and using an optimization algorithm to select a specific subset with the fewest number of learning images from a subset consisting of a predetermined number of learning images containing all the individual types, calculating the remaining learning images by removing a specific number of learning images contained in the specific subset, and repeating the process of selecting at least one other specific subset consisting of a predetermined number of learning images containing all the individual types from the remaining learning images using an optimization algorithm until n learning images have been selected. [Selection Diagram] Figure 9

Inventors

金桂賢
李鉉東

Assignees

スパーブエーアイカンパニーリミテッド

Dates

Publication Date: 20260507
Application Date: 20241021

Claims (20)

In a method for selecting training data for training a deep learning model, (a) A training data selection device generates at least one attribute corresponding to each of the at least one object contained in each of the numerous training images stored in the data pool, and generates a binary graph matching each of the numerous training images with the attribute; and (b) The training data selection device (i) refers to the binary graph and, through an optimization algorithm, selects a specific subset of training images consisting of a predetermined number of training images containing all of the attribute types, which has the fewest number of training images, calculates the remaining training images after removing a specific number of training images contained in the specific subset, and (ii) from the remaining training images, through the optimization algorithm, selects at least one other specific subset consisting of a predetermined number of training images containing all of the attribute types, and repeats this process until n training images for training the deep learning model (where n is the target number of training images for training the deep learning model and is an integer representing a plurality of numbers) are selected; A method that includes this.
In step (b) above, The training data selection device uses linear programming to calculate the product of a PxQ binary matrix corresponding to the P individual types and Q training images in the binary graph and a Q-dimensional vector representing the selection goodness-of-fit variable for each of the Q training images in each of the P individual types. The device generates a P-dimensional vector (the P-dimensional vector represents the sum of the goodness-of-fit variables for the Q training images belonging to each of the P individual types) where the sum of the goodness-of-fit variables is 1 or more, and the selection goodness-of-fit variable in the Q-dimensional vector is between 0 and 1. From among the selection goodness-of-fit variables of the Q-dimensional vector, the device selects a specific subset that includes a specific training image corresponding to a specific selection goodness-of-fit variable whose sum of the selection goodness-of-fit variables has the minimum value. The device then calculates the remaining training images by removing the specific training image included in the specific subset from the Q training images. The method according to claim 1, wherein the process of selecting at least one other specific subset using the linear programming method with respect to the remaining training images is repeated so that the number of selected training images is n or more.
In step (b) above, The method according to claim 2, wherein the training data selection device selects the specific subset using a dual linear programming method that applies at least one of the following constraints in linear programming: merging, separating, and sign changing.
In step (b) above, The training data selection device uses integer programming to calculate the product of a PxQ binary matrix corresponding to the P individual types and Q training images in the binary graph and a Q-dimensional vector representing the selection variable for each of the Q training images in each of the P individual types. The device generates a P-dimensional vector (the P-dimensional vector represents the selection quantity of training images belonging to each of the P individual types) in which the selection quantity is 1 or more, and the selection variable in the Q-dimensional vector is 0 or 1. From among the selection variables of the Q-dimensional vector, the device selects a specific subset that includes a specific training image corresponding to a specific selection variable whose sum of the selection variables is the minimum value. The device then calculates the remaining training images by removing the specific training image included in the specific subset from the Q training images. The method according to claim 1, wherein the process of selecting at least one other specific subset from the remaining training images using the integer programming method is repeated so that the number of selected training images is n or more.
In step (b) above, The method according to claim 4, wherein the training data selection device selects the particular subset using a dual integer programming method to which at least one of merging, separating, and sign-changing constraints in integer programming is applied.
In step (a) above, The method according to claim 1, wherein the learning data selection device transmits the learning images to a labeler terminal, and a labeler corresponding to the labeler terminal generates at least one individual type corresponding to each of the objects contained in each of the learning images.
In step (a) above, The method according to claim 1, wherein the training data selection device performs object detection on each of the training images to detect at least one object from each of the training images, generates a cropped image by cropping the region corresponding to the bounding box of each detected object in each of the training images, performs an embedding operation on each of the cropped images to generate an object vector corresponding to each of the cropped images, clusters the object vectors to generate an object cluster, and generates the individual type corresponding to the training image by referring to the object cluster.
In step (a) above, The method according to claim 1, wherein the training data selection device refers to the ground truth information contained in each of the training images to generate a cropped image obtained by cropping the region corresponding to the bounding box of each object from each of the training images, performs an embedding operation on each of the cropped images to generate an object vector corresponding to each of the cropped images, clusters the object vectors to generate an object cluster, and refers to the object cluster to generate the individual type corresponding to the training image.
In step (a) above, The method according to claim 1, wherein the learning data selection device checks the metadata contained in each of the learning images, and further refers to the shooting time contained in each of the metadata to generate the individual type corresponding to each of the learning images.
In step (b) above, The aforementioned specific individual type includes a first_1 specific individual type to a first_x specific individual type corresponding to the object (where x is an integer of 1 or more), and a second_1 specific individual type to a second_y specific individual type corresponding to the time of shooting (where y is an integer of 1 or more), The method according to claim 9, wherein the learning data selection device selects the n learning images such that the number of the first_1 specific individual type to the first_x specific individual type corresponding to the object that matches the n learning images and the number of the second_1 specific individual type to the second_y specific individual type corresponding to the shooting time are within the threshold deviation, the number of the first_1 specific individual type to the number of the first_x specific individual type are within the first threshold deviation, and the number of the second_1 specific individual type to the second_y specific individual type are within the second threshold deviation.
In a training data selection device for selecting training data for training a deep learning model, A memory containing instructions for selecting training data for training a deep learning model; and a processor that performs operations for selecting training data for training the deep learning model in accordance with the instructions stored in the memory; Includes, The processor is a learning data selection device that performs the following processes until n learning images for training the deep learning model are selected (i) each at least one attribute corresponding to each at least one object contained in each of a large number of learning images stored in a data pool, and a binary graph matching each of the large number of learning images with the attribute; and (ii) by referring to the binary graph, an optimization algorithm selects a specific subset with the fewest number of learning images from a subset consisting of a predetermined number of learning images containing all the attribute types, calculates the remaining learning images excluding a specific number of learning images contained in the specific subset, and (ii) the optimization algorithm selects at least one other specific subset from the remaining learning images consisting of a predetermined number of learning images containing all the attribute types, and repeats this process until n learning images for training the deep learning model are selected (where n is the target number of learning images for training the deep learning model, and is an integer representing a plurality of numbers).
The aforementioned processor, In the process described in (II) above, using linear programming, the product of a PxQ binary matrix corresponding to the P individual types and Q training images in the binary graph and a Q-dimensional vector representing the selection goodness-of-fit variable for each of the Q training images in each of the P individual types is calculated to generate a P-dimensional vector (the P-dimensional vector represents the sum of the goodness-of-fit for the Q training images belonging to each of the P individual types). The sum of the goodness-of-fit in the P-dimensional vector is 1 or more, and the selection goodness-of-fit variable in the Q-dimensional vector is 0 or more and 1 or less. From among the selection goodness-of-fit variables of the Q-dimensional vector, a specific subset is selected that includes a specific training image corresponding to a specific selection goodness-of-fit variable whose sum of the selection goodness-of-fit variables has the minimum value. The remaining training images are calculated by removing the specific training image included in the specific subset from the Q training images. The learning data selection device according to claim 11, wherein the process of selecting at least one other specific subset using the linear programming method with respect to the remaining learning images is repeated so that the number of selected learning images is n or more.
The aforementioned processor, The training data sorting apparatus according to claim 12, wherein in the process of (II) above, the particular subset is sorted using a dual linear programming method which applies at least one of the constraint merging, separation, and sign changing in linear programming.
The aforementioned processor, In the process described in (II) above, using integer programming, the product of a PxQ binary matrix corresponding to the P individual types and Q training images in the binary graph and a Q-dimensional vector representing the selection variable for each of the Q training images in each of the P individual types is calculated to generate a P-dimensional vector (the P-dimensional vector represents the selection quantity of training images belonging to each of the P individual types), and the selection quantity in the P-dimensional vector is 1 or more, and the selection variable in the Q-dimensional vector is 0 or 1. From among the selection variables of the Q-dimensional vector, a specific subset is selected that includes a specific training image corresponding to a specific selection variable whose sum of the selection variables has the minimum value, and the remaining training images are calculated by removing the specific training image included in the specific subset from the Q training images. The learning data selection device according to claim 11, wherein the process of selecting at least one other specific subset using the linear programming method with respect to the remaining learning images is repeated so that the number of selected learning images is n or more.
The aforementioned processor, The training data sorting apparatus according to claim 14, wherein in the process of (II), the particular subset is sorted using a dual integer programming method which applies at least one of the following constraints in integer programming: merging, separating, and sign-changing.
The aforementioned processor, The learning data sorting apparatus according to claim 11, wherein in the process of (I) above, the learning images are transmitted to a labeler terminal, and a labeler corresponding to the labeler terminal generates at least one individual type corresponding to each of the objects contained in each of the learning images.
The aforementioned processor, The learning data sorting apparatus according to claim 11, wherein in the process of (I) above, object detection is performed on each of the learning images to detect at least one object from each of the learning images, a cropped image is generated by cropping the region corresponding to the bounding box of each of the detected objects in each of the learning images, an embedding operation is performed on each of the cropped images to generate an object vector corresponding to each of the cropped images, the object vectors are clustered to generate an object cluster, and the individual types corresponding to the learning images are generated by referring to the object cluster.
The aforementioned processor, The learning data sorting device according to claim 11, wherein in the process of (I) above, the device refers to the ground truth information contained in each of the learning images to generate a cropped image obtained by cropping the region corresponding to the bounding box of each object from each of the learning images, performs an embedding operation on each of the cropped images to generate an object vector corresponding to each of the cropped images, clusters the object vectors to generate an object cluster, and refers to the object cluster to generate the individual type corresponding to the learning images.
The aforementioned processor, The learning data sorting apparatus according to claim 11, wherein in the process of (I) above, the metadata contained in each of the learning images is checked, and the shooting time contained in each of the metadata is further referenced to generate the individual type corresponding to each of the learning images.
The aforementioned processor, In the process described in (II) above, a first_1 specific individual type or a first_x specific individual type (where x is an integer of 1 or more) corresponding to the object, and a second_1 specific individual type or a second_y specific individual type (where y is an integer of 1 or more) corresponding to the time of capture, A learning data selection device according to claim 19, which selects the n learning images such that the number of the first_1 specific individual type to the first_x specific individual type corresponding to the object that matches the n learning images and the number of the second_1 specific individual type to the second_y specific individual type corresponding to the shooting time are within the threshold deviation, the number of the first_1 specific individual type to the first_x specific individual type are within the first threshold deviation, and the number of the second_1 specific individual type to the second_y specific individual type are within the second threshold deviation.

Description

This invention relates to a method for uniformly selecting training data for training a deep learning model from all training data stored in a data pool, without bias or variability in the data, and to a training data selection device utilizing this method. Generally, deep learning models recognize complex patterns in images, text, sound, and other data to generate accurate insights and predictions, and are applied in various fields such as computer vision, speech recognition, autonomous vehicles, robotics, natural language processing, and medical image analysis. In order for such deep learning models to accurately perform their intended tasks, they must be trained using a large amount of training data. Traditional methods for selecting training data for deep learning models from a collected data pool include random sampling, which selects a target number of training data from the entire training data stored in the data pool, and vector quantization, which clusters and groups the vectors representing each of the training data generated by embedding extraction, and then selects representative values for each group of grouped vectors. For example, Patent Document 1 discloses a method and apparatus for generating learning data required to train an animated character based on deep learning, and Patent Document 2 discloses a similarity-based clustering apparatus and method utilizing deep learning learning techniques. Furthermore, Patent Document 3 discloses a training apparatus and method for a deep learning classification model, and Patent Document 4 discloses a system and method for training a machine learning model using active learning. However, conventional methods for selecting training data have the problem of resulting in bias and variability in data types. For example, if a data pool contains 1 million training images, with 70% related to sunny weather, 20% related to cloudy weather, 5% related to foggy weather, and 5% related to snowy and/or rainy weather, then randomly sampling 10,000 training images would result in only about 500 images being selected from a total of 50,000 images related to snowy and/or rainy weather. This would lead to a bias and variability in the selection of training images based on weather type. Furthermore, while using vector quantization to select training images can somewhat mitigate the bias and variability in the types of training images selected by embedding extraction and clustering, it cannot fundamentally prevent problems related to data bias and variability. Therefore, the applicant aims to propose a method for uniformly selecting training data for training a deep learning model from all training data stored in a data pool, categorized by type, without bias or variability. U.S. Patent No. 1,1106942Korean Published Patent No. 10-2023-0068941Patent No. 7225614U.S. Patent No. 1,1663409 The following drawings, attached for use in describing embodiments of the present invention, represent only a portion of the embodiments, and a person with ordinary skill in the art to which the present invention pertains (hereinafter referred to as "ordinary art") can obtain other drawings based on these drawings without performing any inventive work. Figure 1 is a schematic diagram showing a training data selection device for selecting training data for training a deep learning model according to one embodiment of the present invention.Figure 2 is a schematic diagram illustrating a method for selecting training data for training a deep learning model according to the first embodiment of the present invention.Figure 3 is a schematic diagram illustrating an example of generating individual types of training data in the first embodiment of the present invention.Figure 4 is a schematic diagram illustrating another example of generating individual types of training data in the first embodiment of the present invention.Figure 5 is a diagram illustrating a binary graph obtained by matching each of the training data with an individual type in the first embodiment of the present invention.Figure 6a is a schematic diagram showing the process of selecting training data by referring to a binary graph in the first embodiment of the present invention.Figure 6b is a schematic diagram illustrating the process of selecting training data by referring to a binary graph in the first embodiment of the present invention.Figure 6c is a schematic diagram illustrating the process of selecting training data by referring to a binary graph in the first embodiment of the present invention.Figure 6d is a schematic diagram illustrating the process of selecting training data by referring to a binary graph in the first embodiment of the present invention.Figure 7 is a schematic diagram illustrating a method for selecting training data for training a deep learning model according to a second embodiment of the present invention.Figure 8 is a schematic diagram illustrating a method for selecting training data for training