JP-2026076140-A - Label confidence evaluation method and apparatus

JP2026076140AJP 2026076140 AJP2026076140 AJP 2026076140AJP-2026076140-A

Abstract

[Problem] To provide an objective and reliable label confidence evaluation method and apparatus for the prediction accuracy of a model. [Solution] The method includes the steps of: acquiring an image set consisting of multiple images, each containing a label corresponding to at least one class; dividing the image set into K unit sets; selecting one of the unit sets as a verification set, and then repeatedly performing the process of training a predetermined network function using the remaining K-1 unit sets as training data K times to generate K evaluation models; inputting the verification sets into the corresponding evaluation models and outputting predicted values corresponding to each of the multiple images; and evaluating the reliability of the labels by comparing the predicted values with the labels of the corresponding images. [Selection Diagram] Figure 2

Inventors

キル、グァンヨン

Assignees

ニューロクルインコーポレーテッド

Dates

Publication Date: 20260511
Application Date: 20251023
Priority Date: 20241023

Claims (11)

A method for evaluating label confidence, The step of obtaining an image set consisting of multiple images, each containing a label corresponding to at least one class; The step of dividing the aforementioned image set into K unit sets; After selecting one of the aforementioned unit sets as the validation set, the process of training a predetermined network function using the remaining K-1 unit sets as training data is repeated K times to generate K evaluation models; A step of inputting the verification set into the corresponding evaluation model and outputting a predicted value corresponding to each of the multiple images; and, A method characterized by including the step of comparing the predicted values with the corresponding labels of the images and evaluating the reliability of the labels.
The aforementioned labels are for the purpose of image classification. The aforementioned predicted value includes information for the probability value corresponding to each of the aforementioned classes. The step of evaluating the reliability of the aforementioned label is: A step of determining self-confidence based on the probability value corresponding to the correct label extracted from the predicted values corresponding to each image; and, The method according to claim 1, further comprising the step of determining the confidence level of the image-specific label by utilizing the difference between the self-confidence level and the highest probability value among the other classes excluding the correct label.
The step of evaluating the reliability of the aforementioned label is: The process further includes the step of normalizing the difference between the self-confidence level and the highest probability value among the other classes excluding the correct label to a predetermined range and calculating a normalized margin. The method according to claim 2, wherein the step of determining the confidence level of the label is performed based on the normalization margin.
The aforementioned label is for object detection purposes. The predicted value includes information on the coordinates of at least one bounding box, the class predicted to exist in each of the bounding boxes, and the probability value. The step of evaluating the reliability of the aforementioned label is: The method according to claim 1, performed through a confidence score calculated based on at least one of the following: a bad location score based on the degree of mismatch between the bounding box of the predicted value and the bounding box included in the corresponding label for each image; an overlooked score when the bounding box of the label corresponding to the predicted value is missing; and a swapped score when the class of the predicted value is different from the class of the corresponding label.
The step of evaluating the reliability of the aforementioned label is: A step of calculating a first confidence score per bounding box for the bounding box included in the predicted value and at least one of the labels, based on at least one of the position error score, the oversight score, and the class replacement score for each image; A step of calculating a second image-level confidence score for each of the images through a predetermined calculation that enhances the influence of the bounding box, which has a low first confidence score; and, The method according to claim 4, further comprising the step of determining the confidence level of the label for each of the images based on the second confidence score.
The aforementioned position error points are The method according to claim 4, calculated based on at least one of the IoU (Intersection-over-Union) value and the difference in center coordinates between the bounding box of the predicted value and the bounding box included in the corresponding label.
The aforementioned labels are for the purpose of object segmentation. The predicted value includes information for the probability value corresponding to each of the classes for each pixel of the image, The step of evaluating the confidence of the label involves preprocessing the predicted values by extracting and storing the probability value of the correct label from the predicted values for each pixel in each image; A step of calculating a first confidence score in pixels using the pre-processed predicted values; A step of calculating a second image-unit confidence score for each of the images through a predetermined operation that enhances the influence of the pixels with low first confidence scores; and, The method according to claim 1, further comprising the step of determining the confidence level of the label for each of the images based on the second confidence score.
A step of obtaining a second image set consisting of a plurality of second images, each containing a second label corresponding to at least one of the aforementioned classes; The step of dividing the aforementioned second image set into K second unit sets; A step of inputting the second unit set into the corresponding evaluation model and outputting a second predicted value corresponding to each of the multiple second images; and, The method according to claim 1, further comprising the step of comparing the second predicted value with the second label of the corresponding second image to evaluate the reliability of the second label.
A step of obtaining a second image set consisting of a plurality of second images, each containing a second label corresponding to at least one of the aforementioned classes; A step of inputting each of the second image sets into K of the evaluation models and ensembling the output values to output a second predicted value corresponding to each of the multiple second images; and, The method according to claim 1, further comprising the step of comparing the second predicted value with the second label of the corresponding second image to evaluate the reliability of the second label.
A computer program stored on a recording medium for performing the method described in any one of claims 1 to 9.
A label confidence evaluation device, At least one processor; and, Includes memory for storing programs that can be executed by the aforementioned processor, The device is characterized in that the processor, by executing the program, acquires an image set consisting of multiple images, each containing a label corresponding to at least one class; divides the image set into K unit sets; selects one of the unit sets as a verification set; and then repeats the process of training a predetermined network function using the remaining K-1 unit sets as training data K times to generate K evaluation models; inputs the verification sets into the corresponding evaluation models to output predicted values corresponding to each of the multiple images; and evaluates the reliability of the labels by comparing the predicted values with the labels of the corresponding images.

Description

This application relates to a method and apparatus for evaluating label reliability. The performance of deep learning models heavily depends on data quality, particularly the accuracy of labels. Most deep learning models are trained on large datasets, and these datasets must contain correct labels for each image or sample for the model to make accurate predictions. However, building large datasets in practice requires extensive labeling work, and label errors frequently occur during this process. Labeling tasks are primarily performed by crowdworkers, who often lack a deep understanding of the data itself and the deep learning models. This can lead to oversimplification or incorrect labeling. For example, images or samples may be assigned incorrect labels because they do not belong to the correct class, objects within the same class may be assigned different labels, and important objects for detection and segmentation may be missed during the labeling process. Various types of labeling errors can occur. Such errors directly negatively impact the model's learning process. Because deep learning models build predictive models based on given data, learning from data containing incorrect labels degrades the model's performance and reduces the reliability of its predictions. In particular, since deep learning models learn patterns from large datasets, even a small number of label errors can have a serious impact on the overall model performance. Therefore, to resolve the problems caused by such labeling errors, new techniques are needed to identify low-confidence labels. This is a block diagram of a label reliability evaluation device according to an embodiment of the present application.This is a flowchart of the label reliability evaluation method according to the embodiment of this application.These are diagrams illustrating the process of generating the evaluation model and predicted values in the label reliability evaluation method according to an embodiment of this application.Figure 3 shows an example of the S250 stage.Figure 3 shows an example of the S250 stage.This diagram illustrates the process of determining the reliability of an object detection label using the label reliability evaluation method according to an embodiment of this application.This diagram illustrates the process of determining the reliability of an object detection label using the label reliability evaluation method according to an embodiment of this application.This diagram illustrates the process of determining the reliability of an object detection label using the label reliability evaluation method according to an embodiment of this application.Figure 3 shows an example of the S250 stage.This is a flowchart of the label reliability evaluation method according to the embodiment of this application. To better understand the drawings cited in this application, a brief description of each drawing is provided. Because the technical concept of this application can be modified in various ways and has various embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technical concept of this application to specific embodiments, but rather to include all modifications, equivalents, or substitutes that fall within the scope of the technical concept of this application. In explaining the technical concept of this application, if it is determined that providing a specific explanation of related prior art would unnecessarily obscure the gist of this application, such detailed explanation will be omitted. The terminology used herein is for illustrative purposes only and is not intended to limit and/or restrict this application. Singular expressions include plural expressions unless the context clearly indicates otherwise. Furthermore, numbers used herein (e.g., 1st, 2nd, etc.) are merely identifiers to distinguish one component from another. In this specification, when a part is described as being connected to another part, this includes not only direct connections but also indirect connections through other components in between. Furthermore, when a part is described as containing a component, this means, unless otherwise stated, that it may contain other components rather than excluding them. Furthermore, in this application, the term "or" is intended to mean an implicational "or," not an exclusive "or." That is, where not distinctly specified or clearly defined in context, "X utilizes A or B" is intended to mean one of the natural implicational substitutions. That is, if X utilizes A; X utilizes B; or X utilizes both A and B, "X utilizes A or B" can apply to any of the aforementioned cases. Also, the term "and/or" as used herein refers to and includes all possible combinations of one or more of the enumerated related configurations. Furthermore, terms such as "~part," "~device," "~child," and "~module" as described in this application refer to a unit that proces