US-12619764-B2 - Specifying characteristics of an output dataset of a data pipeline
Abstract
Embodiments of the present disclosure are directed to techniques for constructing and configuring a data privacy pipeline to generate collaborative data in a data trustee environment. An interface of the trustee environment can serve as a sandbox for parties to generate, contribute to, or otherwise configure a data privacy pipeline by selecting, composing, and arranging any number of input datasets, computational steps, and contract outputs. (e.g., output datasets, permissible named queries on collaborative data). The interface may allow a contributing party to use one or more unspecified “placeholder” elements, such as placeholder datasets or placeholder computations, as building blocks in a pipeline under development. Parameterized access control may authorize designated participants to access, view, and/or contribute to designated portions of a contact or pipeline. Authorized participants may indicate their approval, and the pipeline may be deployed in the data trustee environment pursuant to the agreed upon parameters.
Inventors
- Tomer TURGEMAN
- Yisroel Gershon TABER
- Lev ROZENBAUM
- Ittay Levy OPHIR
Assignees
- MICROSOFT TECHNOLOGY LICENSING, LLC
Dates
- Publication Date
- 20260505
- Application Date
- 20220323
Claims (20)
- 1 . A computer system comprising: one or more hardware processors and memory configured to provide computer program instructions to the one or more hardware processors; and a graphical user interface configured to generate interface elements associated with a configuration of a data pipeline, the configuration comprising a specification of: an input dataset, one or more computational steps that define the data pipeline, and a collaborative dataset that is derivable by executing the one or more computational steps on the input dataset, wherein the interface elements enable: configuring fine-grained element-by-element parameterized access controls on the one or more computational steps that are operations that are executable on the data pipeline, receiving multi-collaborator approval of the one or more computational steps in the data pipeline, and triggering the data pipeline to derive the collaborative dataset based on approval of the data pipeline from collaborators and receiving input from an authorized one of the collaborators requesting to trigger the data pipeline; the graphical user interface is configured to use the one or more hardware processors to: cause presentation of a representation of an arrangement of the one or more computational steps that form the data pipeline; receive input requesting to identify the collaborative dataset derivable by the data pipeline from input datasets provided by at least one of a plurality of collaborators; prompt for and accept input identifying the collaborative dataset as an output of a selected one of the one or more computational steps; and update a representation of the data pipeline to identify the output, without exposing contents of the input datasets to the collaborators.
- 2 . The computer system of claim 1 , the graphical user interface configured to use the one or more hardware processors to cause presentation of a representation of a plurality of collaborative datasets derivable by the data pipeline from the input datasets, without exposing contents of the plurality of collaborative datasets.
- 3 . The computer system of claim 1 , the graphical user interface configured to use the one or more hardware processors to cause presentation of a representation of ownership of each of a plurality of collaborative datasets derivable by the data pipeline from the input datasets.
- 4 . The computer system of claim 1 , the graphical user interface configured to use the one or more hardware processors to cause presentation of a representation of which of the one or more computational steps is configured to output which of a plurality of collaborative datasets derivable by the data pipeline from the input datasets.
- 5 . The computer system of claim 1 , the graphical user interface configured to use the one or more hardware processors to prompt for and accept input identifying a destination for the collaborative dataset.
- 6 . The computer system of claim 1 , the graphical user interface configured to use the one or more hardware processors to update, based on input identifying a destination for the collaborative dataset, the representation of the data pipeline to identify the destination.
- 7 . The computer system of claim 1 , the graphical user interface configured to use the one or more hardware processors to prompt for and receive, from the collaborators, an indication that the collaborators have approved the data pipeline.
- 8 . The computer system of claim 1 , the graphical user interface configured to use the one or more hardware processors to trigger, based on approval of the data pipeline from each of the collaborators and receiving input from an authorized one of the collaborators requesting to trigger the data pipeline, the data pipeline to derive one or more outputs without exposing contents of the input datasets to the collaborators.
- 9 . The computer system of claim 1 , wherein the one or more computational steps represent one or more operations that are configurable by the collaborators and executable on the input datasets.
- 10 . One or more computer storage media storing computer-useable instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations comprising: accessing, by a graphical user interface, a representation of an arrangement of one or more computational steps that form a data pipeline, wherein the graphical user interface is configured to generate interface elements associated with a configuration of a data pipeline, the configuration comprising a specification of: an input dataset, the one or more computational steps that define the data pipeline, and a collaborative dataset that is derivable by executing the one or more computational steps on the input dataset provided by at least one of a plurality of collaborators, wherein the interface elements enable: configuring fine-grained element-by-element parameterized access controls on the one or more computational steps that are operations that are executable on the data pipeline, receiving multi-collaborator approval of the one or more computational steps in the data pipeline, and triggering the data pipeline to derive the collaborative dataset based on approval of the data pipeline from collaborators and receiving input from an authorized one of the collaborators requesting to trigger the data pipeline; receiving, by the graphical user interface, input requesting to identify the collaborative dataset derivable by the data pipeline from input datasets provided by at least one of a plurality of collaborators; prompting for and accepting, by the graphical user interface, input identifying the collaborative dataset as an output of a selected one of the one or more computational steps; and updating, by the graphical user interface, a representation of the data pipeline to identify the output, without exposing contents of the input datasets to the collaborators.
- 11 . The one or more computer storage media of claim 10 , the operations further comprising causing, by the graphical user interface, presentation of a representation of a plurality of collaborative datasets derivable by the data pipeline from the input datasets, without exposing contents of the plurality of collaborative datasets.
- 12 . The one or more computer storage media of claim 10 , the operations further comprising causing, by the graphical user interface, presentation of a representation of ownership of each of a plurality of collaborative datasets derivable by the data pipeline from the input datasets.
- 13 . The one or more computer storage media of claim 10 , the operations further comprising causing, by the graphical user interface, presentation of a representation of which of the one or more computational steps is configured to output which of a plurality of collaborative datasets derivable by the data pipeline from the input datasets.
- 14 . The one or more computer storage media of claim 10 , the operations further comprising prompting for and accepting, by the graphical user interface, input identifying a destination for the collaborative dataset.
- 15 . The one or more computer storage media of claim 10 , the operations further comprising updating, by the graphical user interface and based on input identifying a destination for the collaborative dataset, the representation of the data pipeline to identify the destination.
- 16 . The one or more computer storage media of claim 10 , the operations further comprising prompting for and receiving, by the graphical user interface and from the collaborators, an indication that the collaborators have approved the data pipeline.
- 17 . A method comprising: accessing, by a graphical user interface, a representation of an arrangement of one or more computational steps, wherein the graphical user interface is configured to generate interface elements associated with a configuration of a data pipeline, the configuration comprising a specification of: an input dataset, the one or more computational steps that define the data pipeline, and a collaborative dataset that is derivable by executing the one or more computational steps on the input dataset provided by at least one of a plurality of collaborators, wherein the interface elements enable: configuring fine-grained element-by-element parameterized access controls on the one or more computational steps that are operations that are executable on the data pipeline, receiving multi-collaborator approval of the one or more computational steps in the data pipeline, and triggering the data pipeline to derive the collaborative dataset based on approval of the data pipeline from collaborators and receiving input from an authorized one of the collaborators requesting to trigger the data pipeline; receiving input requesting to identify the collaborative dataset derivable by the data pipeline from input datasets provided by at least one of a plurality of collaborators; prompting for and accepting input identifying the collaborative dataset as an output of a selected one of the one or more computational steps; and updating a representation of the data pipeline to identify the output, without exposing contents of the input datasets to the collaborators.
- 18 . The method of claim 17 , further comprising causing, by the graphical user interface, presentation of a representation of a plurality of collaborative datasets derivable by the data pipeline from the input datasets, without exposing contents of the plurality of collaborative datasets.
- 19 . The method of claim 17 , further comprising causing, by the graphical user interface, presentation of a representation of ownership of each of a plurality of collaborative datasets derivable by the data pipeline from the input datasets.
- 20 . The method of claim 17 , further comprising causing, by the graphical user interface, presentation of a representation of which of the one or more computational steps is configured to output which of a plurality of collaborative datasets derivable by the data pipeline from the input datasets.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is continuation of U.S. patent application Ser. No. 16/665,916, filed on Oct. 28, 2019, entitled “User Interface for Building a Data Privacy Pipeline and Contractual Agreement to Share Data” which itself is a continuation-in-part of U.S. patent application Ser. No. 16/388,696, filed on Apr. 18, 2019, entitled “Data Privacy Pipeline Providing Collaborative Intelligence And Constraint Computing,” the contents of each of which are herein incorporated by reference in their entirety. BACKGROUND Businesses and technologies increasingly rely on data. Many types of data can be observed, collected, derived, and analyzed for insights that inspire progress in science and technology. In many cases, valuable intelligence can be derived from datasets, and useful products and services can be developed based on that intelligence. This type of intelligence can help advance industries such as banking, education, government, health care, manufacturing, retail, and practically any other industry. However, in many cases, the datasets owned or available to a particular data owner are incomplete or limited in some fundamental way. Information sharing is one way to bridge gaps in datasets, and sharing data has become an increasingly common practice. There are many benefits from sharing data. However, there are also many concerns and obstacles. SUMMARY Embodiments of the present disclosure are directed to techniques for constructing and configuring a data privacy pipeline to generate collaborative data in a data trustee environment from shared input data. At a high level, an interface of the data trustee environment, such as a graphical user interface, can enable tenants (e.g., customers, businesses, users) or other contributing parties to specify parameters for a contractual agreement to share and access data. Generally, the interface can serve as a sandbox for parties to generate, contribute to, or otherwise configure a data privacy pipeline by selecting, composing, and arranging any number of input datasets, computational steps, and contract outputs. Example contract outputs include output datasets generated from a data privacy pipeline, permissible named queries on collaborative data, and the like. To facilitate multi-party development of collaborative intelligence contracts, in some embodiments, the interface allows a contributing party to use placeholder elements in a pipeline under development. For example, a contributing party may want to build up components of a pipeline before an input dataset or computation has been provided or identified. As such, the interface may support building a pipeline or portion thereof with one or more unspecified “placeholder” elements, such as placeholder datasets or placeholder computations. For example, a placeholder dataset may include a specified input schema without specifying the actual input data. In another example, a placeholder computation may include a specified input and output schema without specifying the actual computation to be performed. This way, another party can subsequently fill in the placeholder element. Thus, the interface can facilitate multi-party contributions made to any desired portion of a pipeline in any order. Parameterized access control may authorize designated participants to access, view, and/or contribute to designated portions of a contact or pipeline. Once the parties are done building, authorized participants may indicate their approval, and the contract and corresponding pipeline may be deployed in a data trustee environment pursuant to the agreed upon parameters. This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in isolation as an aid in determining the scope of the claimed subject matter. BRIEF DESCRIPTION OF THE DRAWINGS The present invention is described in detail below with reference to the attached drawing figures, wherein: FIG. 1 is a block diagram of an example collaborative intelligence environment, in accordance with embodiments described herein; FIG. 2 is a block diagram of an example constraint manager, in accordance with embodiments described herein; FIG. 3 is an illustration of an example user interface for browsing collaborative intelligence contracts, in accordance with embodiments described herein; FIG. 4 is an illustration of an example user interface for naming a new collaborative intelligence contract, in accordance with embodiments described herein; FIG. 5 is an illustration of an example user interface for specifying inputs to a data privacy pipeline associated with a collaborative intelligence contract, in accordance with embodiments described herein; FIG. 6 is an illustration of another example user interface for specifying inputs to a data priv