Search

US-12625884-B2 - Natural language processing for blockchain-based management of multi-source, multi-format binary objects

US12625884B2US 12625884 B2US12625884 B2US 12625884B2US-12625884-B2

Abstract

A system and method of blockchain-based data management of multi-source, multi-format data objects. The method including obtaining a plurality of data objects from a plurality of sources in a plurality of formats. The method including storing the plurality of data objects on one or more blockchains in a blockchain format. The method including generating, by a processing device using natural language processing (NLP), one or more keywords for each data object of the plurality of data objects. The method including respectively linking each of a plurality of metadata tags to a single data object of the plurality of data objects on the one or more blockchains, each metadata tag comprising the one or more keywords for the single data object.

Inventors

  • Irene Wong Woerner
  • Ronald Chi King Kong
  • Vivian Chow

Assignees

  • EMTRUTH, INC.

Dates

Publication Date
20260512
Application Date
20230609

Claims (16)

  1. 1 . A method of blockchain-based data management of multi-source, multi-format data objects that enables secure combining, sharing and analyzing data of any size or format from many sources on demand, the method comprising: obtaining a plurality of data objects from a plurality of sources in a plurality of formats; storing the plurality of data objects on one or more blockchains in a blockchain format; generating, by a processing device using natural language processing (NLP), a respective set of one or more keywords for each data object of the plurality of data objects based on each data object; respectively linking each of a plurality of metadata tags to a single data object of the plurality of data objects on the one or more blockchains, each metadata tag comprising the generated respective set of the one or more keywords for the single data object; determining, by the processing device, a capability of a client device separate from the processing device to read data in a target data format; generating, based on a data dictionary, a connector that defines a mapping relationship between the blockchain format and the target data format responsive to determining the capability, wherein the data dictionary comprises predefined schema information that defines field mappings and transformation rules for the target data format; transforming, using the connector, a first data object stored on the one or more blockchains from the blockchain format to the target data format and a second data object stored on the one or more blockchains from the blockchain format to the target data format; storing, on the one or more blockchains, a reference to the first data object in the target data format and the second data object in the target data format, wherein the reference is configured to aggregate, responsive to an interaction with the reference, the first data object in the target data format and the second data object in the target data format into an aggregated data object for the client device; and storing, on the one or more blockchains, an audit trail indicating which users have shared at least one of the plurality of data objects stored on the one or more blockchains.
  2. 2 . The method of claim 1 , further comprising: receiving a request to provide access to one or more data objects from the plurality of data objects associated with a keyword; determining, using the plurality of metadata tags, that the first data object stored on the one or more blockchains is associated with the keyword; and providing, responsive to receiving the request, access to the first data object to the client device.
  3. 3 . The method of claim 2 , wherein the first data object and the second data object are obtained from different sources of the plurality of sources.
  4. 4 . The method of claim 2 , wherein the providing, responsive to receiving the request, the access to the first data object to the client device comprises: generating access rights indicating that the client device has permission to access the first data object; and linking, on the one or more blockchains, access rights to the first data object.
  5. 5 . The method of claim 4 , further comprising: receiving a second request to prevent the client device from accessing the first data object; and updating, on the one or more blockchains, the access rights that are linked to the first data object to indicate that the access rights have been revoked.
  6. 6 . The method of claim 1 , further comprising: transmitting the first data object in the target data format to the client device.
  7. 7 . The method of claim 1 , further comprising: instantiating a data mart to store at least one of the plurality of data objects or the single data object in a subject-oriented format, the data mart comprising a plurality of processing resources and at least one storage device.
  8. 8 . A system that enables secure combining, sharing and analyzing data of any size or format from many sources on demand comprising: a memory; and a processing device of a first service provider, the processing device is operatively coupled to the memory, to: obtain a plurality of data objects from a plurality of sources in a plurality of formats; store the plurality of data objects on one or more blockchains in a blockchain format; generate, using natural language processing (NLP), a respective set of one or more keywords for each data object of the plurality of data objects based on each data object; respectively link each of a plurality of metadata tags to a single data object of the plurality of data objects on the one or more blockchains, each metadata tag comprising the generated respective set of the one or more keywords for the single data object; determine a capability of a client device separate from the processing device to read data in a target data format; generate, based on a data dictionary, a connector that defines a mapping relationship between the blockchain format and the target data format responsive to determining the capability, wherein the data dictionary comprises predefined schema information that defines field mappings and transformation rules for the target data format; transform, using the connector, a first data object stored on the one or more blockchains from the blockchain format to the target data format and a second data object stored on the one or more blockchains from the blockchain format to the target data format; store, on the one or more blockchains, a reference to the first data object in the target data format and the second data object in the target data format, wherein the reference is configured to aggregate, responsive to an interaction with the reference, the first data object in the target data format and the second data object in the target data format into an aggregated data object for the client device; and store, on the one or more blockchains, an audit trail indicating which users have shared at least one of the plurality of data objects stored on the one or more blockchains.
  9. 9 . The system of claim 8 , wherein the processing device is further to: receive a request to provide access to one or more data objects from the plurality of data objects associated with a keyword; determine, using the plurality of metadata tags, that the first data object stored on the one or more blockchains are associated with the keyword; and provide, responsive to receiving the request, access to the first data object to the client device.
  10. 10 . The system of claim 9 , wherein the first data object and the second data object are obtained from different sources of the plurality of sources.
  11. 11 . The system of claim 9 , wherein to provide, responsive to receiving the request, the access to the first data object to the client device, the processing device is further to: generate access rights indicating that the client device has permission to access the first data object; and link, on the one or more blockchains, the access rights to the first data object.
  12. 12 . The system of claim 11 , wherein the processing device is further to: receive a second request to prevent the client device from accessing the first data object.
  13. 13 . The system of claim 12 , wherein the processing device is further to: update, on the one or more blockchains, the access rights that are linked to the first data object to indicate that the access rights have been revoked.
  14. 14 . The system of claim 8 , wherein the processing device is further to: transmit the first data object in the target data format to the client device.
  15. 15 . The system of claim 8 , wherein the processing device is further to: instantiate a data mart to store at least one of the plurality of data objects or the single data object in a subject-oriented format, the data mart comprising a plurality of processing resources and at least one storage device.
  16. 16 . A non-transitory computer-readable medium storing instructions that enables secure combining, sharing and analyzing data of any size or format from many sources on demand, when executed by a processing device of a first service provider, cause the processing device to: obtain a plurality of data objects from a plurality of sources in a plurality of formats; store the plurality of data objects on one or more blockchains in a blockchain format; generate, by the processing device using natural language processing (NLP), a respective set of one or more keywords for each data object of the plurality of data objects based on each data object; respectively link each of a plurality of metadata tags to a single data object of the plurality of data objects on the one or more blockchains, each metadata tag comprising the generated respective set of the one or more keywords for the single data object; determine, by the processing device, a capability of a client device separate from the processing device to read data in a target data format; generate, based on a data dictionary, a connector that defines a mapping relationship between the blockchain format and the target data format responsive to determining the capability, wherein the data dictionary comprises predefined schema information that defines field mappings and transformation rules for the target data format; transform, using the connector, a first data object stored on the one or more blockchains from the blockchain format to the target data format and a second data object stored on the one or more blockchains from the blockchain format to the target data format; store, on the one or more blockchains, a reference to the first data object in the target data format and the second data object in the target data format, wherein the reference is configured to aggregate, responsive to an interaction with the reference, the first data object in the target data format and the second data object in the target data format into an aggregated data object for the client device; and store, on the one or more blockchains, an audit trail indicating which users have shared at least one of the plurality of data objects stored on the one or more blockchains.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application Ser. No. 63/353,774 entitled “NATURAL LANGUAGE PROCESSING FOR BLOCKCHAIN-BASED MANAGEMENT OF MULTI-SOURCE, MULTI-FORMAT BINARY OBJECTS,” filed Jun. 20, 2022, the disclosure of which is incorporated herein by reference in its entirety. TECHNICAL FIELD The present disclosure relates generally to software technology, and more particularly, to a blockchain-based data management (BDM) system that uses natural language processing (NPL) for blockchain-based management of multi-source, multi-format binary objects. BRIEF DESCRIPTION OF THE DRAWINGS The described embodiments and the advantages thereof may best be understood by reference to the following description taken in conjunction with the accompanying drawings. These drawings in no way limit any changes in form and detail that may be made to the described embodiments by one skilled in the art without departing from the spirit and scope of the described embodiments. FIG. 1 is a block diagram that illustrates an example environment of distributed data virtualization using blockchain technology and natural language processing (NPL), in accordance with some embodiments of the present disclosure; FIG. 2 is a block diagram of an example graphical user interface of an application that provides traceability and distributed change tracking, in accordance with embodiments of the disclosure; FIG. 3 is a block diagram that illustrates an example transformation of data streams into more granular blockchain data blocks, in accordance with some embodiments of the present disclosure; FIG. 4 is a block diagram that illustrates an example high-level approach with the result being a demonstrable prototype that non-IT users can utilize to securely aggregate and share multi-source data on-demand, in accordance with some embodiments of the present disclosure; FIG. 5 is a block diagram that illustrates an example flow diagram depicting a method of blockchain-based management of multi-source, multi-format binary objects, in accordance with some embodiments of the present disclosure; FIG. 6 is a block diagram that illustrates an example environment for using a BDM system for blockchain-based management of multi-source, multi-format binary objects, in accordance with some embodiments of the present disclosure; FIG. 7 is a flow diagram depicting a method of using natural language processing (NPL) for blockchain-based management of multi-source, multi-format binary objects, in accordance with some embodiments; and FIG. 8 is a block diagram of an example computing device that may perform one or more of the operations described herein, in accordance with some embodiments. DETAILED DESCRIPTION The embodiments of the present disclosure were made with government support under 2125909 awarded by the National Science Foundation. The government has certain rights in the embodiments of the present disclosure. The present disclosure will now be described more fully hereinafter with reference to example embodiments thereof with reference to the drawings in which like reference numerals designate identical or corresponding elements in each of the several views. These example embodiments are described so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. Features from one embodiment or aspect can be combined with features from any other embodiment or aspect in any appropriate combination. For example, any individual or collective features of method aspects or embodiments can be applied to apparatus, product, or component aspects or embodiments and vice versa. The disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. It is time-consuming, labor intensive, and costly to combine data that is in different formats and from different systems (e.g., data sources, servers, clouds, etc.), and nearly impossible for non-technical users. Healthcare organizations, for example, are hindered by internal and external data silos. Healthcare today is multi-system, even for just one patient. For example, there might be a system for their medical record, another system for their imaging data, and so on. Outside a hospital's four walls, securely aggregating and sharing data with other hospitals, doctors or labs becomes exponentially harder and costly. Current approaches require expensive, centralized data warehouses and specialized information technology (IT) professionals when new analytics are needed. Furthermore, value-based healthcare requires nimble integration of data in a myriad of data formats from many systems across multiple organizations. For example, current blockchain solutions are mainly in transactional finance with data block size limits that ar