JP-7854956-B2 - Data management system and method
Inventors
- 礒田 有哉
- 垂水 信二
Assignees
- 株式会社日立製作所
Dates
- Publication Date
- 20260507
- Application Date
- 20230323
Claims (11)
- Equipped with a memory device and a processor, The aforementioned storage device stores access policies for each entity, For each entity, the access policy includes, for each of one or more operational attributes, access permissions per nth data point for use in the application or model for that entity (where n is a non-negative integer). The aforementioned storage device, for each operation, The operation log, which is the log of the operation in question, The system stores a data log corresponding to the source data, which is the data for the operation, and a data log associated with the operation log, and/or a data log corresponding to the target data, which is the data resulting from the operation, and a data log associated with the operation log. The storage device stores an entity list based on the access policy for the operation log and/or data log. Each entity list includes an inclusion list, which is a list of entities whose data is permitted for use in the operation or data corresponding to the log to which the entity list is associated, and/or an exclusion list, which is a list of entities whose data is prohibited from being used in the operation or data corresponding to the log to which the entity list is associated. The aforementioned processor, If, in response to a request, one or more entity lists containing the entities identified based on the request are found among multiple entity lists, the usage status is determined based on one or more operation logs and one or more data logs identified using those one or more entity lists. The data representing the identified usage status is returned to the requester of the aforementioned request. Data management system.
- The aforementioned usage status is a lineage identified based on one or more operation logs and one or more data logs identified above. The aforementioned lineage is a Directed Acyclonic Graph (DAG) in which data or operations are nodes and models correspond to intermediate nodes or leaf nodes. The data management system according to claim 1.
- The aforementioned usage includes the contribution of the identified entity, The aforementioned contribution includes a value calculated based on the total number of data points and the number of available data points. The total number of data points mentioned above is the number of data points in the (m-k)th order data that can be used to generate a model as the mth order data (both m and k are integers less than or equal to the maximum value of n, and m is greater than k), The number of available data items is the number of data items corresponding to entities for which access rights are permitted up to the mth level of data in the access policy, out of the total number of data items. The data management system according to claim 1.
- The aforementioned access policy includes access rights for each of several types of models as m-th order data. The calculated value is based on the total number of data points and the number of available data points, and the number of a predetermined type of model for which access rights are permitted. The data management system according to claim 3.
- The aforementioned access policy includes access rights for each of several types of models as m-th order data. The aforementioned usage includes the number of models for each type of model generated using the data of the identified entity. The data management system according to claim 1.
- The aforementioned usage includes the usage before and after the change in the access permissions in the access policy of the identified entity. The data management system according to claim 1.
- At least one of the operation log and the data log associated with the operation log includes reproducibility information, which is information indicating whether the data can be reproduced. The aforementioned usage status includes, with respect to the reproducibility information contained in one or more operation logs and one or more data logs identified, the reproducibility information after the change in access rights. The data management system according to claim 1.
- If the usage after the change in access rights meets the specified conditions, the processor will offer the requester a promotion to relax or cancel the change in access rights. The data management system according to claim 1.
- Entity lists exist for each operation log and each data log. The data management system according to claim 1.
- A plurality of client computers, including a client computer having the processor and the storage device, The system comprises a server computer that communicates with the aforementioned plurality of client computers, The server computer generates a model through federated learning using machine learning models from the multiple client computers and transmits the generated model to the multiple client computers. In the aforementioned server computer, operation logs and data logs are stored regarding the operations performed by the server computer. In the aforementioned server computer, the operation log and data log are associated with an operation list instead of an entity list. An operation list includes an inclusion list, which is a list of operations or data corresponding to entities for which data use is permitted for the operations or data corresponding to the log to which the operation list is associated, and/or an exclusion list, which is a list of operations or data corresponding to entities for which data use is prohibited for the operations or data corresponding to the log to which the operation list is associated. The data management system according to claim 1.
- If, in response to a request, the computer finds one or more entity lists in which entities identified based on the request are recorded, it will determine the usage based on one or more operation logs and one or more data logs identified using those one or more entity lists. For each entity, the access policy includes, for each of one or more operational attributes, access permissions per nth data point for use in the application or model for that entity (where n is a non-negative integer). For each operation, The operation log, which is the log of the operation in question, There is a data log corresponding to the source data, which is the data for the operation, and associated with the operation log, and/or a data log corresponding to the target data, which is the data resulting from the operation, and associated with the operation log. Regarding the operation log and/or data log, there is an entity list based on the aforementioned access policy. Each entity list includes an inclusion list, which is a list of entities whose data is permitted for use in the operation or data corresponding to the log to which the entity list is associated, and/or an exclusion list, which is a list of entities whose data is prohibited from being used in the operation or data corresponding to the log to which the entity list is associated. The computer returns data representing the identified usage to the requester of the request. Data management methods.
Description
This invention generally relates to data management. One example of data to be managed is private data such as patient medical data. Providing personalized medicine requires segmenting patient characteristics at a fine granularity, which necessitates a large amount of private data. Associative learning is a known method for model training using private data. Patent Document 1 discloses technology related to privacy enhancement. Patent Document 2 discloses technology related to associative learning. Patent Document 3 discloses technology related to machine learning. US10,796,782US2021/0406782JP6782802 This shows the overall system configuration for one embodiment of the present invention.This shows the configuration of a client-server system as a data management system.This diagram schematically illustrates the processing flow up to feature storage on the client computer.This diagram schematically illustrates the processing flow for learning on the client computer and associative learning on the server computer.Multiple operation logs are shown.Multiple data logs are shown.Shows multiple user lists.Show multiple operation lists.Show multiple access policies.This shows the access policy before the access permissions were changed.This shows the access policy after the access permissions have been changed.This shows the usage data before the access permission change.This shows the usage data after the access permissions were changed.This indicates the lineage represented by the lineage data before the access permission change.This indicates the lineage represented by the lineage data after the access permission has been changed.An example of contribution data before the access permissions were changed is shown.An example of contribution data after a change in access permissions is shown.This describes the functions of the client computer and the server computer.This shows the flow of the usage management process.This shows the flow of lineage management processing.This shows the data processing flow.This shows the flow of the learning process.This shows the flow of the inference process.This shows the flow of the contribution management process.The first access management process flow is shown below.The second access management process flow is shown below. In the following explanation, "interface device" may refer to one or more interface devices. These one or more interface devices may be at least one of the following: - One or more I/O (Input/Output) interface devices. An I/O (Input/Output) interface device is an interface device to at least one of the following: an I/O device and a remote display computer. The I/O interface device to the display computer may be a communication interface device. The at least one I/O device may be either a user interface device, such as an input device like a keyboard and a pointing device, or an output device like a display device. - One or more communication interface devices. One or more communication interface devices may be one or more identical communication interface devices (e.g., one or more NICs (Network Interface Cards)) or two or more different communication interface devices (e.g., a NIC and an HBA (Host Bus Adapter)). Furthermore, in the following explanation, "memory" refers to one or more memory devices, which are examples of one or more storage devices, and are typically main memory devices. At least one memory device in memory may be a volatile memory device or a non-volatile memory device. Furthermore, in the following explanation, "persistent storage device" may refer to one or more persistent storage devices, which are examples of one or more storage devices. Persistent storage devices are typically non-volatile storage devices (e.g., auxiliary storage devices), specifically, for example, HDDs (Hard Disk Drives), SSDs (Solid State Drives), NVME (Non-Volatile Memory Express) drives, or SCMs (Storage Class Memory). Furthermore, in the following explanation, "storage device" may refer to at least memory, including both memory and persistent storage. Furthermore, in the following explanation, "processor" may refer to one or more processor devices. At least one processor device may typically be a microprocessor device such as a CPU (Central Processing Unit), but it may also be another type of processor device, such as a GPU (Graphics Processing Unit). At least one processor device may be single-core or multi-core. At least one processor device may be a processor core. At least one processor device may be a broad-sense processor device, such as a circuit that is a collection of gate arrays (e.g., FPGA (Field-Programmable Gate Array), CPLD (Complex Programmable Logic Device), or ASIC (Application Specific Integrated Circuit)) that performs some or all of the processing using a hardware description language. Furthermore, in the following explanation, functions may be described using the expression "yyy section," but a function may be implemented by the execution o