Search

US-12619890-B2 - Learning pattern dictionary from noisy numerical data in distributed networks

US12619890B2US 12619890 B2US12619890 B2US 12619890B2US-12619890-B2

Abstract

A collaborative learning framework is presented. The collaborative framework is implemented by multiple network nodes interconnected by a network. The network nodes belong to multiple client systems of the framework. A network node belonging to a first client system constructs a predictive model for the first client system by using a pattern dictionary that is a built based on a consensus among the multiple client systems. The network node calculates a set of local statistics for the first client system based on raw data of the first client system. The network node computes a consensus set of local statistics by aggregating sets of local statistics from the multiple client systems. The network node updates the pattern dictionary based on current values of the pattern dictionary and the consensus set of local statistics.

Inventors

  • Tsuyoshi Ide
  • Rudy Raymond Harry Putra
  • Dung Phan

Assignees

  • INTERNATIONAL BUSINESS MACHINES CORPORATION

Dates

Publication Date
20260505
Application Date
20191016

Claims (19)

  1. 1 . A computer-implemented method for collaborative learning in a decentralized multitasking environment by a plurality of network nodes interconnected by a network, the plurality of network nodes belonging to a plurality of client systems, the computer-implemented method comprising: constructing, at a first network node, of the plurality of network nodes, belonging to a first client system of the plurality of client systems of the decentralized multitasking environment, a predictive model for the first client system by using a pattern dictionary that is built based on a consensus building among the plurality of client systems; calculating a set of local statistics of the first client system based on raw data of the first client system; enhancing security of the first client system by: splitting the set of local statistics of the first client system into subsets; and stochastically distributing the subsets to network nodes of the plurality of network nodes during the consensus building, wherein the network nodes belong to a set of client systems of the plurality of client systems, wherein the set of client systems is exclusive of the first client system, and wherein the network nodes are exclusive of the first network node; computing a consensus set of local statistics by aggregating sets of local statistics from the plurality of client systems, wherein the sets of local statistics include the set of local statistics, and wherein each client system of the plurality of client systems corresponds to a respective set of local statistics of the sets of local statistics; identifying an irregularity in the first client system using the consensus set of the local statistics; iteratively updating the pattern dictionary based on current values of the pattern dictionary and the consensus set of local statistics, wherein the iterative updating terminates in a case where the sets of local statistics of client systems of the plurality of client systems have a same value or are within a threshold value, wherein each of the construction, the calculation, the computation, and the iterative updating is without a centralized control and operative to conserve computational resources, and wherein the pattern dictionary of the first client system is a local blockchain structure that stores words or a mixture of components; and storing an outcome of the collaborative learning at each local blockchain structure of each client system of the plurality of client systems.
  2. 2 . The computer-implemented method of claim 1 , wherein the predictive model of the first client system is a probability distribution determined based on the raw data of the first client system, a set of mixing weights of the first client system, and the pattern dictionary.
  3. 3 . The computer-implemented method of claim 2 , wherein each of the pattern dictionary, the set of local statistics, and the set of mixing weights comprises components that correspond to the mixture of components, respectively.
  4. 4 . The computer-implemented method of claim 1 , wherein the pattern dictionary is stored in a hash-chain data structure.
  5. 5 . The computer-implemented method of claim 1 , wherein the set of local statistics comprises a set of sufficient statistics of the raw data of the first client system and a set of sample sizes of the raw data of the first client system.
  6. 6 . The computer-implemented method of claim 1 , wherein the raw data comprises real number values generated by one or more Internet of Things (IoT) devices.
  7. 7 . The computer-implemented method of claim 1 , wherein the aggregating of the sets of local statistics from the plurality of client systems comprises exchanging the set of local statistics over the network among the plurality of network nodes without encryption.
  8. 8 . The computer-implemented method of claim 1 , wherein the first network node is a consensus node of the first client system, and wherein the raw data of the first client system is received from a client node of the first client system with encryption.
  9. 9 . The computer-implemented method of claim 5 , further comprising: updating a set of sample weights of the first client system based on the raw data of the first client system and the pattern dictionary; and determining the set of sample sizes based on a sample weight of the set of samples weights.
  10. 10 . The computer-implemented method of claim 1 , wherein the subsets of the set of local statistics of the first client system are stochastically distributed to the network nodes that are identified based on a randomly generated incident matrix.
  11. 11 . A computing device implementing a network node in a plurality of network nodes interconnected by a network, the plurality of network nodes belonging to a plurality of client systems in a collaborative decentralized multitasking learning framework, the computing device comprising: a processor; and a storage device storing a set of instructions, wherein an execution of the set of instructions by the processor configures the computing device to perform operations comprising: constructing, at a first network node, of the plurality of network nodes, belonging to a first client system of the plurality of client systems of the decentralized multitasking learning framework, a predictive model for the first client system by using a pattern dictionary that is built based on a consensus building among the plurality of client systems; calculating a set of local statistics of the first client system based on raw data of the first client system; enhancing security of the first client system by: splitting the set of local statistics of the first client system into subsets; and stochastically distributing the subsets to network nodes of the plurality of network nodes during the consensus building, wherein the network nodes belong to a set of client systems of the plurality of client systems, wherein the set of client systems is exclusive of the first client system, and wherein the network nodes are exclusive of the first network node; computing a consensus set of local statistics by aggregating sets of local statistics from the plurality of client systems, wherein the sets of local statistics include the set of local statistics, and wherein each client system of the plurality of client systems corresponds to a respective set of local statistics of the sets of local statistics; identifying an irregularity in the first client system using the consensus set of the local statistics; iteratively updating the pattern dictionary based on current values of the pattern dictionary and the consensus set of local statistics, wherein the iterative updating terminates in a case where the sets of local statistics of client systems of the plurality of client systems have a same value or are within a threshold value, wherein each of the construction, the calculation, the computation, and the iterative updating is without a centralized control, and wherein the pattern dictionary of the first client system is a local blockchain structure that stores words or a mixture of components; and storing an outcome of the collaborative learning at each local blockchain structure of each client system of the plurality of client systems.
  12. 12 . The computing device of claim 11 , wherein the predictive model of the first client system is a probability distribution determined based on the raw data of the first client system, a set of mixing weights of the first client system, and the pattern dictionary.
  13. 13 . The computing device of claim 11 , wherein the pattern dictionary is stored in a hash-chain data structure.
  14. 14 . The computing device of claim 11 , wherein the set of local statistics comprises a set of sufficient statistics of the raw data of the first client system and a set of sample sizes of the raw data of the first client system.
  15. 15 . The computing device of claim 11 , wherein the aggregating of the sets of local statistics from the plurality of client systems comprises exchanging the set of local statistics over the network among the plurality of network nodes without encryption.
  16. 16 . The computing device of claim 11 , wherein the subsets of the set of local statistics of the first client system are stochastically distributed to the network nodes that are identified based on a randomly generated incident matrix.
  17. 17 . A computer program product implementing a network node in a plurality of network nodes interconnected by a network, the plurality of network nodes belonging to a plurality of client systems in a decentralized multitasking collaborative learning framework, the computer program product comprising: one or more non-transitory computer-readable storage devices and program instructions stored on at least one of the one or more non-transitory computer-readable storage devices, the program instructions executable by a processor, the program instructions comprising sets of instructions for: constructing, at a first network node, of the plurality of network nodes, belonging to a first client system of the plurality of client systems of the decentralized multitasking collaborative learning framework, a predictive model for the first client system by using a pattern dictionary that is built based on a consensus among the plurality of client systems; calculating a set of local statistics of the first client system based on raw data of the first client system; enhancing security of the first client system by: splitting the set of local statistics of the first client system into subsets; and stochastically distributing the subsets to network nodes of the plurality of network nodes during the consensus building, wherein the network nodes belong to a set of client systems of the plurality of client systems, wherein the set of client systems is exclusive of the first client system, and wherein the network nodes are exclusive of the first network node; computing a consensus set of local statistics by aggregating sets of local statistics from the plurality of client systems, wherein the sets of local statistics include the set of local statistics, and wherein each client system of the plurality of client systems corresponds to a respective set of local statistics of the sets of local statistics; identifying an irregularity in the first client system using the consensus of the local statistics; iteratively updating the pattern dictionary based on current values of the pattern dictionary and the consensus set of local statistics, wherein the iterative updating terminates in a case where the sets of local statistics of client systems of the plurality of client systems have a same value or are within a threshold value, wherein each of the construction, the calculation, the computation, and the iterative updating is without a centralized control, and wherein the pattern dictionary of the first client system is a local blockchain structure that stores words or a mixture of components; and storing an outcome of a collaborative learning at each local blockchain structure of each client system of the plurality of client systems.
  18. 18 . The computer program product of claim 17 , wherein the aggregating of the sets of local statistics from the plurality of client systems comprises exchanging the set of local statistics over the network among the plurality of network nodes without encryption.
  19. 19 . The computer program product of claim 17 , wherein the subsets of the set of local statistics of the first client system are stochastically distributed to the network nodes that are identified based on a randomly generated incident matrix.

Description

BACKGROUND Technical Field The present disclosure generally relates to computation by decentralized networks. Description of the Related Arts The design principle of decentralized, secure, and transparent transaction management is casting a new light on machine learning algorithms, especially in the field of federated learning. Furthermore, the traditional notion of security is partly replaced with a stochastic approach in validating data consistency. SUMMARY Some embodiments of the disclosure provide a collaborative learning framework that is implemented by multiple network nodes interconnected by a network. The network nodes belong to multiple client systems of the framework. A network node belonging to a first client system constructs a predictive model for the first client system by using a pattern dictionary that is built based on a consensus among the multiple client systems. The network node calculates a set of local statistics for the first client system based on raw data of the first client system. The network node computes a consensus set of local statistics by aggregating sets of local statistics from the multiple client systems. The network node updates the pattern dictionary based on current values of the pattern dictionary and the consensus set of local statistics. In some embodiments, the network node computes the consensus set of local statistics by distributing different subsets of the set of local statistics of the first client system to different network nodes belonging to different client systems. The different subsets of the set of local statistics of the first client system are distributed to network nodes that are identified based on a randomly generated incident matrix. The preceding Summary is intended to serve as a brief introduction to some embodiments of the disclosure. It is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a Summary, Detailed Description and the Drawings are provided. Moreover, the claimed subject matter is not to be limited by the illustrative details in the Summary, Detailed Description, and the Drawings, but rather is to be defined by the appended claims, because the claimed subject matter can be embodied in other specific forms without departing from the spirit of the subject matter. BRIEF DESCRIPTION OF THE DRAWINGS The drawings are of illustrative embodiments. They do not illustrate all embodiments. Other embodiments may be used in addition or instead. Details that may be apparent or unnecessary may be omitted to save space or for more effective illustration. Some embodiments may be practiced with additional components or steps and/or without all of the components or steps that are illustrated. When the same numeral appears in different drawings, it refers to the same or like components or steps. FIG. 1 illustrates a decentralized system 100 that implements a collaborative learning framework, consistent with an exemplary embodiment. FIG. 2 illustrates consensus nodes of different client systems perform collaborative dictionary learning, consistent with an exemplary embodiment. FIGS. 3a-c illustrate a consensus node of a client system performing collaborative dictionary learning. FIG. 4a-h illustrates examples of cyclic graphs that are used as an incident matrix for a consensus node. FIG. 5 conceptually illustrates a process 500 for performing collaborative dictionary learning, consistent with an exemplary embodiment. FIG. 6 shows a block diagram of the components of a data processing system in accordance with an illustrative embodiment. DETAILED DESCRIPTION In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent that the present teachings may be practiced without such details. In other instances, well-known methods, procedures, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings. Collaborative learning with IoT (Internet-of-Things) devices is one of the applications of distributed learning. The statistical nature of IoT data is different from transaction data among financial entities, being mostly noisy multivariate real-valued data. Real-valued data may be too low-level to be protected with existing cryptographic technologies. In addition, security requirements of IoT data are different from that of money transfer. For many IoT applications, high-level statistics such as a production yield are often of more interest, rather than the exact values of individual data samples. Some