US-20260128895-A1 - ENHANCED DISTRIBUTED ZERO-KNOWLEDGE PROOF GENERATION

US20260128895A1US 20260128895 A1US20260128895 A1US 20260128895A1US-20260128895-A1

Abstract

Zero-knowledge proof (ZKP) generation can be enhancedly managed and performed. Proof manager can decompose directed acyclic graph (DAG) representative of data analytics workflow into respective sub-DAGs representative of respective workflow portions. Based on respective sub-DAGs, proof manager can determine respective sub-ZKPs relating to respective workflow portions and sub-DAGs. Respective sub-ZKPs can be concurrently determined using respective servers. Proof manager can hash respective input and output values of respective sub-DAGs and include respective hashed values in respective sub-ZKPs. Based on linking respective sub-ZKPs having respective hashed values that satisfy a defined match criterion, proof manager can determine and generate ZKP relating to the workflow, comprising computation results. Proof manager can communicate ZKP to a verifier to facilitate verification of correctness of ZKP, including the computation results. Proof manager can store sub-ZKPs and reuse a stored sub-ZKP that relates to a computation of a sub-DAG of another workflow.

Inventors

Pingchuan MA
Shuai Wang
Zhaoyu Wang

Assignees

THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY

Dates

Publication Date: 20260507
Application Date: 20250408

Claims (20)

1 . A method, comprising: decomposing, by a system comprising at least one processor, a graph representative of a workflow into respective subgraphs representative of respective portions of the workflow; based on the respective subgraphs, determining, by the system, respective subproofs relating to the respective portions of the workflow and the respective subgraphs; and based on the respective subproofs, generating, by the system, a proof relating to the workflow.
2 . The method of claim 1 , wherein the proof is a zero-knowledge proof and the respective subproofs are respective zero-knowledge subproofs.
3 . The method of claim 1 , wherein the workflow is a data analytics workflow, wherein the graph is a directed acyclic graph, and wherein the respective subgraphs are respective directed acyclic subgraphs.
4 . The method of claim 1 , further comprising: determining, by the system, respective sub-result data items relating to the workflow based on respective computational operations performed within the respective subgraphs, wherein the respective subproofs comprise the respective sub-result data items; and determining, by the system, the proof, comprising result data, relating to the workflow based on the respective subproofs, wherein the result data is determined based on the respective sub-result data items.
5 . The method of claim 4 , further comprising: communicating, by the system, the proof to a verifier device, wherein the proof facilitates verification that the proof, comprising the result data, is correct, without revealing, to the verifier device, underlying data of the workflow that was utilized to determine and generate the proof.
6 . The method of claim 4 , wherein respective edges between the respective subgraphs are representative of respective data dependencies, wherein the respective subgraphs comprise respective input values and respective output values, and wherein the method further comprises: hashing, by the system, the respective input values and the respective output values of the respective subgraphs to generate respective hashed input values and respective hashed output values of the respective subgraphs that are included in the respective subproofs, wherein the respective hashed input values and the respective hashed output values are committed in the respective subproofs as respective hash commitments to ensure integrity of the workflow, including integrity of a data flow between the respective subgraphs; and linking, by the system, the respective subproofs, based on the respective hashed input values and the respective hashed output values of the respective subproofs, to generate the proof.
7 . The method of claim 6 , wherein the respective subproofs comprise a first subproof and a second subproof, wherein the first subproof comprises a first hashed input value and a first hashed output value, wherein the second subproof comprises a second hashed input value and a second hashed output value, and wherein the method further comprises: determining, by the system, that an output of the first subproof is to be linked to an input of the second subproof based on determining that the first hashed output value of the first subproof satisfies a defined match criterion with respect to the second hashed input value of the second subproof, wherein the linking comprises linking the output of the first subproof to the input of the second subproof based on determining that the output of the first subproof is to be linked to the input of the second subproof.
8 . The method of claim 1 , wherein the workflow is a first workflow, wherein the proof is a first proof, and wherein the method further comprises: storing, by the system, the respective subproofs in a data store, wherein the respective subproofs are associated with respective computational tasks, comprising a first subproof associated with a first computational task that is performed within a first subgraph of the respective subgraphs; in connection with a second workflow, determining, by the system, that a second computational task associated with the second workflow satisfies a defined similarity criterion with respect to the first computational task associated with the first subproof based on an analysis of the second computational task and the first computational task, wherein the second computational task is associated with a second subgraph representative of a portion of the second workflow; based on the determining that the second computational task associated with the second workflow satisfies the defined similarity criterion with respect to the first computational task associated with the first subproof, retrieving, by the system, the first subproof from the data store; and utilizing, by the system, the first subproof as a second subproof with respect to the second subgraph representative of the portion of the second workflow to facilitate generating a second proof relating to the second workflow.
9 . The method of claim 1 , wherein the respective subgraphs comprise a first subgraph representative of a first portion of the workflow and a second subgraph representative of a second portion of the workflow, wherein the respective subproofs comprise a first subproof relating to the first subgraph and a second subproof relating to the second subgraph, and wherein the determining of the respective subproofs comprises: determining the first subproof based on the first subgraph; and in parallel with the determining of the first subproof, determining the second subproof based on the second subgraph.
10 . A system, comprising: at least one memory that stores computer executable components; and at least one processor that executes computer executable components stored in the at least one memory, wherein the computer executable components comprise: a decomposer that decomposes a graph representative of a workflow into respective subgraphs relating to respective portions of the workflow; and a proof generator that, based on the respective subgraphs, determines respective subproofs relating to the respective portions of the workflow and the respective subgraphs, to facilitate generation of a proof relating to the workflow.
11 . The system of claim 10 , wherein the workflow is a data analytics workflow, wherein the proof is a zero-knowledge proof and the respective subproofs comprise respective zero-knowledge subproofs, and wherein the graph is a directed acyclic graph and the respective subgraphs comprise respective directed acyclic subgraphs.
12 . The system of claim 10 , wherein the proof generator determines respective sub-result information items relating to the workflow based on respective computational tasks performed within the respective subgraphs, and determines and generates the proof, comprising result information, relating to the workflow based on the respective subproofs, and wherein the respective subproofs comprise the respective sub-result information data items.
13 . The system of claim 12 , wherein the proof generator transmits or facilitates transmission of the proof to a verifier device, and wherein the proof facilitates validation that the proof, comprising the result information, is accurate, without divulging, to the verifier device, underlying information of the workflow that was utilized to determine and generate the proof.
14 . The system of claim 13 , wherein respective edges between the respective subgraphs are representative of respective data dependencies, wherein the respective subgraphs comprise respective input values and respective output values, and wherein the computer executable components further comprise: a hasher that hashes the respective input values and the respective output values of the respective subgraphs to generate respective hashed input values and respective hashed output values of the respective subgraphs that are part of the respective subproofs, wherein the respective hashed input values and the respective hashed output values are committed in the respective subproofs as respective hash commitments to ensure integrity of the workflow and integrity of a data flow between the respective subgraphs; and a linker that links the respective subproofs, based on the respective hashed input values and the respective hashed output values of the respective subproofs, to facilitate generation of the proof.
15 . The system of claim 14 , wherein the respective subproofs comprise a first subproof and a second subproof, wherein the first subproof comprises a first hashed input value and a first hashed output value, wherein the second subproof comprises a second hashed input value and a second hashed output value, wherein the linker determines that an output of the first subproof is to be linked to an input of the second subproof based on a determination that the first hashed output value of the first subproof satisfies a defined match criterion with respect to the second hashed input value of the second subproof, and wherein, in response to determining that the output of the first subproof is to be linked to the input of the second subproof, the linker links the output of the first subproof to the input of the second subproof.
16 . The system of claim 10 , wherein the workflow is a first workflow, wherein the respective subproofs are associated with respective computational tasks, comprising a first subproof associated with a first computational task that is performed within a first subgraph of the respective subgraphs, wherein the proof is a first proof, wherein the proof generator stores or facilitates storage of the respective subproofs in a data store, wherein, in connection with a second workflow, the proof generator determines that a second computational task associated with the second workflow satisfies a defined similarity criterion with respect to the first computational task associated with the first subproof based on a result of an analysis of the second computational task and the first computational task, wherein the second computational task is associated with a second subgraph representative of a portion of the second workflow, wherein, based on determining that the second computational task satisfies the defined similarity criterion with respect to the first computational task, the proof generator obtains the first subproof from the data store, and wherein the proof generator utilizes the first subproof as a second subproof with respect to the second subgraph representative of the portion of the second workflow to facilitate generating a second proof relating to the second workflow, or modifies the first subproof to generate a modified subproof and utilizes the modified subproof as the second subproof with respect to the second subgraph to facilitate generating the second proof.
17 . The system of claim 10 , wherein the respective subgraphs comprise a first subgraph representative of a first portion of the workflow and a second subgraph representative of a second portion of the workflow, wherein the respective subproofs comprise a first subproof relating to the first subgraph and a second subproof relating to the second subgraph, and wherein a first server is utilized to facilitate determination of the first subproof, based on the first subgraph, concurrently with a second server being utilized to facilitate determination of the second subproof, based on the second subgraph.
18 . The system of claim 10 , wherein the decomposer or the proof generator comprises or utilizes an accelerator unit, a graphics processing unit, a field-programmable gate array, or an application specific integrated circuit.
19 . A non-transitory machine-readable medium, comprising executable instructions that, when executed by at least one processor, facilitate performance of operations, comprising: segmenting a directed acyclic graph representative of a workflow into respective directed acyclic subgraphs relating to respective portions of the workflow; based on the respective directed acyclic subgraphs, generating respective subproofs relating to the respective portions of the workflow and the respective directed acyclic subgraphs; and based on the respective subproofs, generating a proof relating to the workflow.
20 . The non-transitory machine-readable medium of claim 19 , wherein the workflow is a data analytics workflow, wherein the proof is a zero-knowledge proof and the respective subproofs comprise respective zero-knowledge subproofs, and wherein the operations further comprise: determining respective sub-result data items relating to the data analytics workflow based on respective computational tasks performed within the respective directed acyclic subgraphs, wherein the respective zero-knowledge subproofs comprise the respective sub-result data items; determining the zero-knowledge proof, comprising result data, relating to the data analytics workflow based on the respective zero-knowledge subproofs, wherein the result data is determined based on the respective sub-result data items; and transmitting the zero-knowledge proof to a verifier device, wherein the zero-knowledge proof facilitates verification that the zero-knowledge proof, comprising the result data, is accurate, without exposing, to the verifier device, underlying data of the data analytics workflow that was utilized to determine and generate the zero-knowledge proof.

Description

CROSS-REFERENCE TO RELATED APPLICATION This patent application claims priority to U.S. Provisional Patent Application No. 63/714,884, filed Nov. 1, 2024, and entitled, “A New Approach for Distributed Zero-Knowledge Proof Generation for Data Analytics Workflow,” the entirety of which application is hereby incorporated by reference herein. BACKGROUND Various cryptographic tools can be utilized to protect data. For instance, zero-knowledge proofs can be a cryptographic tool that can enable a prover to convince a verifier of the correctness of computations without revealing the underlying data utilized in performing the computations and generating the zero-knowledge proof. The above-described description is merely intended to provide a contextual overview regarding cryptographic tools and proofs, and is not intended to be exhaustive. SUMMARY The following presents a simplified summary in order to provide a basic understanding of some aspects described herein. This summary is not an extensive overview of the disclosed subject matter. It is intended to neither identify key or critical elements of the disclosure nor delineate the scope thereof. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later. In some embodiments, the disclosed subject matter can comprise a method that can comprise decomposing, by a system comprising at least one processor, a graph representative of a workflow into respective subgraphs representative of respective portions of the workflow. The method also can comprise: based on the respective subgraphs, determining, by the system, respective subproofs relating to the respective portions of the workflow and the respective subgraphs. The method further can comprise: based on the respective subproofs, generating, by the system, a proof relating to the workflow. In certain embodiments, the disclosed subject matter can comprise a system that can comprise at least one memory that can store computer executable components, and at least one processor that can execute computer executable components stored in the at least one memory. The computer executable components can comprise a decomposer that can decompose a graph representative of a workflow into respective subgraphs relating to respective portions of the workflow. The computer executable components also can comprise a proof generator that, based on the respective subgraphs, can determine respective subproofs relating to the respective portions of the workflow and the respective subgraphs, to facilitate generation of a proof relating to the workflow. In still other embodiments, the disclosed subject matter can comprise a non-transitory machine-readable medium, comprising executable instructions that, when executed by at least one processor, can facilitate performance of operations. The operations can comprise segmenting a directed acyclic graph representative of a workflow into respective directed acyclic subgraphs relating to respective portions of the workflow. The operations also can comprise: based on the respective directed acyclic subgraphs, generating respective subproofs relating to the respective portions of the workflow and the respective directed acyclic subgraphs. The operations further can comprise: based on the respective subproofs, generating a proof relating to the workflow. The following description and the annexed drawings set forth in detail certain illustrative aspects of the subject disclosure. These aspects are indicative, however, of but a few of the various ways in which the principles of various disclosed aspects can be employed and the disclosure is intended to include all such aspects and their equivalents. Other advantages and features will become apparent from the following detailed description when considered in conjunction with the drawings. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates a block diagram of a non-limiting example system that can desirably perform and manage proof generation for datasets, in accordance with various aspects and embodiments of the disclosed subject matter. FIG. 2 depicts a block diagram of a non-limiting example proof manager component that can desirably perform and manage proof generation, including proof generation for larger-scale datasets and distributed proof generation, in accordance with various aspects and embodiments of the disclosed subject matter. FIG. 3 illustrates a diagram of a non-limiting example proof generation process that can comprise decomposing a graph relating to a workflow into subgraphs, generating subproofs based at least in part on the subgraphs, and generating a proof relating to the workflow based at least in part on the subproofs, in accordance with various aspects and embodiments of the disclosed subject matter. FIG. 4 depicts a block diagram of a non-limiting example of a linking of two subproofs to each other, in accordance with various aspects and embodiments of the