EP-4740136-A1 - AUTOMATED IDENTIFICATION OF QUALITY DEVIATIONS IN MACHINE LEARNING MODELS

EP4740136A1EP 4740136 A1EP4740136 A1EP 4740136A1EP-4740136-A1

Abstract

A method for identifying quality deviations in a machine learning application, implementable by corresponding systems and computer-readable mediums, may include training (205), based on a world-truth paradigm (202), a first machine learning application (MLA) (212) to generate a world-truth prediction model (216); labeling (209), via the first MLA using the world-truth prediction model, intended-use training data (224) and field data (228); training (211) a second machine learning application (MLA) (240) using the labeled intended-use training data to generate an intended-use model (246); generating (219), via the trained second MLA, using the labeled field data and the intended-use model, at least one prediction (256) having an associated confidence level; and determining at least one quality deviation (258) of the intended-use model from the world-truth paradigm or of the at least one prediction and from the world-truth paradigm.

Inventors

DEGEN, HEINRICH HELMUT
MARKOV, Georgi
NAGARAJA, Parinitha
BUDNIK, CHRISTOF J.

Assignees

Siemens Aktiengesellschaft

Dates

Publication Date: 20260513
Application Date: 20230804

Claims (15)

1. A method for identifying quality deviations in a machine learning model, the method performed by at least one data processing system, comprising: training (205), based on a world-truth paradigm (202), a first machine learning application (MLA) (212) to generate a world-truth prediction model (216); labeling (209, 217), via the first MLA using the world-truth prediction model, intended-use training data (224) and field data (228); training (211) a second machine learning application (MLA) (240) using the labeled intended-use training data to generate an intended-use model (246); generating (219), via the trained second MLA, using the labeled field data and the intended-use model, at least one prediction having an associated confidence level (256); and determining (219) at least one quality deviation (258) of the intended-use model from the world-truth paradigm, or at least one quality deviation of the at least one prediction from the world-truth paradigm.
2. The method of claim 1, wherein training the first MLA includes receiving at least one attribute (204) of the world-truth paradigm, the at least one attribute defining a distribution of a plurality of attribute values.
3. The method of claim 1, wherein training the first MLA further includes: receiving labeled world-truth training data (208) and labeled world-truth test data (210), the labeled world-truth training and test data labeled based on the world-truth paradigm; generating the world-truth prediction model based on the world-truth training data; and generating a first plurality of key performance indicators (KPIs) (214) associated with generating the world-truth prediction model.
4. The method of claim 3, wherein training the first MLA further includes: testing (207) the world-truth prediction model via the first MLA using the worldtruth test data; generating a first test result (220); and generating a second plurality of key performance indicators (KPIs) (222) associated with testing the world-truth prediction model.
5. The method of claim 4, wherein training the first MLA further includes: comparing (215) the first plurality of KPIs and the second plurality of KPIs to determine whether there is a statistical difference between the first test result and the worldtruth paradigm; and retraining, when there is a statistical difference between the first test result and the world-truth paradigm, the world-truth prediction model based on second world-truth training data (152d).
6. The method of claim 1, wherein training the second MLA further includes: labeling, via the first MLA, intended-use test data (226) based on the world-truth paradigm; generating a third plurality of KPIs (232) associated with labeling the intended-use test data and the intended-use training data; generating a fourth plurality of KPIs (242) associated with training the second MLA and the intended-use model; testing (213) the intended-use model via the second MLA using the labeled intended-use test data; generating a second test result (252) and a fifth plurality of KPIs (250) associated with testing the intended-use model.
7. The method of claim 6, wherein determining at least one quality deviation includes comparing the at least one prediction against the third, fourth, and fifth pluralities of KPIs to identify a difference between the at least one prediction and an expected result represented by at least one of the third, fourth, or fifth pluralities of KPIs.
8. The method of claim 7, wherein determining at least one quality deviation includes determining whether at least one of the first, second, third, fourth, and fifth pluralities of KPIs violates a world truth threshold.
9. The method of claim 8, further comprising: generating a bias notification (260) based on the determination of whether the world truth threshold was violated, the bias notification configured to be displayed on a display (111); and repeating, when the world truth threshold was violated, the steps of training or testing the first or second MLAs using new training data or testing data.
10. The method of claim 1, further including: determining whether the confidence level of the at least one prediction exceeds a confidence level threshold; and generating a notification (266) on a display device (111), the notification including the at least one prediction and whether the confidence level threshold was exceeded.
11. The method of claim 1 , wherein determining at least one quality deviation includes determining whether a world truth threshold (202a) was violated by the training data.
12. The method of claim 11, further including: determining whether the confidence level of the at least one prediction exceeds a confidence level threshold (262); and generating a notification (266) on a display device, the notification including: the at least one prediction; the at least one quality deviation; whether the confidence level threshold was exceeded; and whether the world truth threshold was exceeded.
13. The method of claim 12, further including: causing an automated intended-use system to perform or cease to perform an intended-use operation.
14. A data processing system (100) comprising: a processor (102); and an accessible memory (108), the data processing system particularly configured to perform a method as in any of claims 1-13.
15. A non-transitory computer-readable medium (126) encoded with executable instructions that, when executed, cause one or more data processing systems to perform a method as in any of claims 1-13.

Description

AUTOMATED IDENTIFICATION OF QUALITY DEVIATIONS IN MACHINE LEARNING MODELS TECHNICAL FIELD [0001] The present disclosure is directed, in general, to methods and systems for training machine learning applications and identifying quality deviations in machine learning models. BACKGROUND OF THE DISCLOSURE [0002] Machine learning models are developed, trained, and used based on biased data that may not represent a proper distribution of the characteristics represented by the data. The biased data may inadvertently include noise or may include overrepresented or underrepresented characteristics the machine learning model is being trained to predict. Further, a data set can be characterized as having too many unknown unknowns, that is, it is difficult to know which data is missing. Training the machine learning model using such data causes the machine learning model to develop unintended biases or quality deviations. Machine learning models require manual-human validation to determine any quality deviations between what any given machine learning model predicts, what is expected, and what the data is trained on. Since validation of the data and the machine learning model requires significant human effort and time, improvement of the machine learning model is delayed resulting in lower accuracy predictions. Improved methods for validating data and machine learning models are desired. SUMMARY OF THE DISCLOSURE [0003] Various disclosed embodiments include a method for automatically identifying quality deviations in a machine learning model. A first machine learning application (MLA) is trained based on a world truth paradigm to generate a world-truth prediction model. Intended-use training data and field data are labeled via the first MLA using the world-truth prediction model. A second machine learning application is trained using the labeled intended-use training data to generate an intended-use model. The method includes generating, using the labeled field data and the intended-use model in the second MLA, at least one prediction having an associated confidence level. The method includes determining at least one quality deviation of the intended-use model from the world-truth paradigm, or at least one quality deviation of the at least one prediction from the worldtruth paradigm. [0004] In various embodiments, the method includes defining at least one attribute of the world-truth paradigm, the at least one attribute defining a distribution of a plurality of attribute values. [0005] In various embodiments, the method includes labeling world-truth training data and world-truth test data based on the world-truth paradigm. [0006] In various embodiments, the method includes generating the world-truth prediction model based on the world-truth training data. [0007] In various embodiments, the method includes generating a first plurality of key performance indicators (KPIs) associated with generating the world-truth prediction mode. [0008] In various embodiments, the method includes testing the world-truth prediction model via the first MLA using the world-truth test data. [0009] In various embodiments, the method includes generating a first test result and generating a second plurality of key performance indicators (KPIs) associated with testing the world- truth prediction model. [0010] In various embodiments, the method includes comparing the first plurality of KPIs and the second plurality of KPIs to determine whether there is a statistical difference between the first test result and the world-truth paradigm. [0011] In various embodiments, the method includes retraining, when there is a statistical difference between the first test result and the world-truth paradigm, the world-truth prediction model based on second world-truth training data. [0012] In various embodiments, the method includes labeling, via the first MLA, intendeduse test data based on the world-truth paradigm. [0013] In various embodiments, the method includes generating a third plurality of KPIs associated with labeling the intended-use test data and the intended-use training data. [0014] In various embodiments, the method includes generating a fourth plurality of KPIs associated with training the second MLA and the intended-use model. [0015] In various embodiments, the method includes testing the intended-use model via the second MLA using the labeled intended-use test data. [0016] In various embodiments, the method includes generating a second test result and a fifth plurality of KPIs associated with testing the intended-use model. [0017] In various embodiments, determining at least one quality deviation includes determining whether a world truth threshold was violated by the training data. [0018] In various embodiments, determining at least one quality deviation includes determining whether at least one of the first, second, third, fourth, and fifth pluralities of KPIs violates a world truth threshold. [0019] In various embodiments, the method in