Search

JP-7855772-B1 - Information processing device, information processing method, and program

JP7855772B1JP 7855772 B1JP7855772 B1JP 7855772B1JP-7855772-B1

Abstract

[Problem] To provide an information processing device that can easily evaluate both the confidentiality of synthesized data and its usefulness as machine learning data. [Solution] The system comprises: a data generation unit that generates multiple synthetic datasets based on an original dataset; a machine learning execution unit that automatically performs machine learning on an AI model using each synthetic dataset and the original dataset as training data; a learning accuracy evaluation unit that evaluates and outputs the accuracy of the machine learning; an accuracy evaluation unit that evaluates and outputs the accuracy of each synthetic dataset; and an evaluation result presentation unit that presents the accuracy evaluation results of each synthetic dataset and the machine learning accuracy evaluation results so that they can be compared. [Selection Diagram] Figure 5

Inventors

  • 大沼 顕介
  • ベルナルド ぺレス オロスコ
  • オーウェン パーソンズ
  • 近藤 結
  • フアニアン ズー
  • ネイサン コルダ
  • 立花 悠貴
  • ドラン カミス
  • アンドレアス ペンタリオティス
  • パトリック ターニー
  • ダビデ ジッリ

Assignees

  • あいおいニッセイ同和損害保険株式会社

Dates

Publication Date
20260508
Application Date
20250730

Claims (11)

  1. A data generation unit that generates multiple composite datasets based on the original dataset, A machine learning execution unit that automatically performs machine learning on an AI model using each of the synthesized datasets and the original dataset as training data, A learning accuracy evaluation unit that evaluates and outputs the accuracy of the aforementioned machine learning, For each composite dataset, a confidentiality evaluation unit evaluates and outputs the degree of confidentiality based on the proportion of composite data in the composite dataset that matches the data in the original dataset , An information processing device comprising: an evaluation result presentation unit that displays the evaluation result of the confidentiality level of each synthetic dataset and the evaluation result of the machine learning accuracy on the same screen .
  2. The aforementioned machine learning execution unit, For each set of training data, machine learning is performed using multiple different AI models. The aforementioned learning accuracy evaluation unit, The information processing apparatus according to claim 1, wherein for each set of training data, the learning accuracy of the AI model with the highest machine learning accuracy is adopted as the evaluation result.
  3. The aforementioned machine learning execution unit, For each training dataset, machine learning is performed using an AI model with multiple different hyperparameters. The aforementioned learning accuracy evaluation unit, The information processing apparatus according to claim 1, wherein for each training data set, the hyperparameter that yields the highest evaluation of machine learning accuracy is selected, and the learning accuracy of the AI model with the hyperparameter set is adopted as the evaluation result.
  4. The aforementioned learning accuracy evaluation unit, The information processing apparatus according to claim 1, which performs an evaluation of at least one of the following: accuracy rate, detection rate, precision, and F-score.
  5. The evaluation result presentation unit is, The information processing apparatus according to claim 1, wherein points representing each composite dataset are displayed on a graph having a first axis showing the value of the evaluation result of the degree of confidentiality and a second axis showing the value of the evaluation result of the accuracy of the machine learning.
  6. The data generation unit, The information processing apparatus according to claim 1, which learns the statistical features of the original dataset using a data generation model and generates a synthetic dataset such that the learned features are preserved.
  7. The data generation model generates synthetic data using a method that applies differential privacy, as described in claim 6 .
  8. The evaluation result presentation unit is, Furthermore, the information processing apparatus according to claim 1, which presents the results of the fidelity evaluation and/or fairness evaluation of each composite dataset so that they can be compared.
  9. The evaluation result presentation unit is, The information processing apparatus according to claim 1, which presents the evaluation results of each composite dataset in a manner that allows comparison of any two selected indicators from three or more independent evaluation indicators as evaluation axes.
  10. The process involves a computer generating multiple synthetic datasets based on an original dataset, The process involves a computer automatically performing machine learning on an AI model using each of the synthesized datasets and the original dataset as training data, The process involves a computer evaluating the accuracy of the machine learning and outputting the result, The process involves a computer evaluating the degree of confidentiality for each synthetic dataset based on the proportion of synthetic data in that dataset that matches the data in the original dataset, and outputting the result. An information processing method comprising the step of a computer displaying the results of the confidentiality evaluation of each synthetic dataset and the results of the machine learning accuracy evaluation on the same screen .
  11. Computers, A data generation unit that generates multiple composite datasets based on the original dataset, A machine learning execution unit that automatically performs machine learning on an AI model using each of the synthesized datasets and the original dataset as training data, A learning accuracy evaluation unit that evaluates and outputs the accuracy of the aforementioned machine learning, For each composite dataset, a confidentiality evaluation unit evaluates and outputs the degree of confidentiality based on the proportion of composite data in the composite dataset that matches the data in the original dataset , A program that functions as an evaluation result presentation unit, displaying the evaluation results of the confidentiality level for each synthetic dataset and the evaluation results of the machine learning accuracy on the same screen .

Description

This invention relates to an information processing device, an information processing method, and a program. Developing AI (Artificial Intelligence) requires machine learning using a large amount of high-quality data. However, when developing AI using data owned and managed by other companies, the inclusion of personal or confidential information can make data acquisition difficult, hindering AI development. As a countermeasure when data contains personal information, for example, Patent Document 1 discloses a device that generates secondary data from primary data containing personal information, with the information modified or deleted so as not to exceed a predetermined anonymity evaluation value. Patent No. 6083101 This figure shows the configuration of the information processing system 1 according to this embodiment.A block diagram showing the configuration of the information processing device 10 according to this embodiment.A block diagram showing the configuration of the user terminal 20 according to this embodiment.This block diagram shows a functional module of a program executed by the processor 11 of the information processing device 10 according to this embodiment.A flowchart of the generation and evaluation process of synthesized data by the information processing system 1 according to this embodiment.A diagram illustrating the original dataset according to this embodiment.This diagram illustrates the process from generating synthetic data to evaluating the accuracy of machine learning according to this embodiment.A figure illustrating an example of automated machine learning according to this embodiment.A diagram illustrating the detailed procedure of automated machine learning according to this embodiment.This figure shows an example of the learning accuracy evaluated by the learning accuracy evaluation unit 103 according to this embodiment.A flowchart illustrating the procedure for evaluating the level of confidentiality by the confidentiality evaluation unit 104 according to this embodiment.This figure shows an example of how the evaluation results of a composite dataset are displayed by the evaluation result presentation unit 105 according to this embodiment.This figure shows an example of how the evaluation results of a composite dataset are displayed when evaluated using four evaluation axes by the evaluation result presentation unit 105 according to this embodiment.This figure shows an example of how the evaluation results of a composite dataset are displayed when evaluated using four evaluation axes by the evaluation result presentation unit 105 according to this embodiment. Next, embodiments for carrying out the present invention will be described in detail with reference to the drawings. Figure 1 is a diagram illustrating the configuration of an information processing system 1 including an information processing device 10 according to an embodiment of the present invention. As shown in Figure 1, the information processing system 1 comprises an information processing device 10 and a user terminal 20. The information processing device 10 is connected to the user terminal 20 via a communication network N such as the Internet. The information processing system 1 has the function of generating synthetic data from original data containing personal and confidential information. Synthetic data, in this context, does not include the same data as the original data, but maintains the characteristics of the data, such as statistical information. Furthermore, the information processing system 1 has the function of evaluating the confidentiality and usefulness of the generated synthetic data. Usefulness here refers to its usefulness as machine learning data, that is, the learning accuracy of an optimized model obtained by performing machine learning using the data. The information processing device 10 may be a general-purpose computer, and may consist of a single computer or multiple computers distributed on a communication network N. The information processing device 10 may be installed in a company providing the information processing system 1, or it may be built on the cloud. Note that the functions of the information processing device 10 in this embodiment may be implemented on the user terminal 20. Figure 2 is a block diagram showing the configuration of the information processing device 10. As shown in Figure 2, the information processing device 10 comprises a processor 11, main memory 12, input/output interface 13, communication interface 14, and storage device 15. The storage device 15 is a computer-readable recording medium such as semiconductor memory (e.g., volatile memory or non-volatile memory) or disk media (e.g., magnetic recording medium or magneto-optical recording medium). The storage device 15 stores programs to be executed by the processor 11, as well as various data. The programs are read from the storage device 15 into the main memory 12, interpreted and executed by the proces