EP-4742025-A2 - CODE EXECUTION AND DATA PROCESSING PIPELINE

EP4742025A2EP 4742025 A2EP4742025 A2EP 4742025A2EP-4742025-A2

Abstract

A method performed by one or more processors comprises displaying code, receiving user selection of a portion of code, determining one or more settable data items, generating a template, displaying the template, receiving a user input value for the settable data items by the template, and executing the code with each of the settable data items set to the received user input value. A data processing pipeline is configured to pass a data item to a first transformer to provide first transformed data, store the first transformed data in a temporary memory, write the first transformed data to the data storage system, and pass the transformed data from the temporary memory to a second transformer.

Inventors

BALL, ELIOT
JENNY, Matthew
GATES, NICHOLAS
PRICE-WRIGHT, ERIN
KHAN, KAMRAN
MANIS, GREGORY
WU, EMELINE

Assignees

Palantir Technologies Inc.

Dates

Publication Date: 20260513
Application Date: 20190130

Claims (15)

A method performed by one or more processors, the method comprising: displaying (210) code having one or more lines; receiving (220) a user selection of a portion of the code; determining (230) one or more settable data items from the selected portion of the code; generating (240) a template, wherein the template is configured to present an interface for the setting of the one or more settable data items, wherein the template includes one or more input boxes for receiving values for the one or more settable data items; displaying (250) the template; receiving (260), by the one or more input boxes of the template, a user input value for each of at least one of the settable data items; and executing (270) the code with the received user input value in place of each of the at least one of the settable data items.
A method as claimed in claim 1, wherein the determining of the one or more settable data items is performed automatically by a template generator.
A method as claimed in claim 2, wherein the template generator determines the one or more settable data items by analyzing a context within which the selected portion of code is contained using at least one of a regular expression or a metaprogramming library.
A method as claimed in claim 2 or claim 3, wherein the template generator maintains a store of the one or more settable data items to be included in the template, wherein the determined one or more settable data items are added to the store when the code portion is selected by the user.
A method as claimed in any preceding claim, comprising: storing the generated template as a specification, for instance in a markup language; and dynamically displaying the template interface using the stored generated template.
A method as claimed in any preceding claim, comprising receiving multiple user selections and using the user selections to determine the one or more settable data items.
A method as claimed in any preceding claim, wherein the at least one of the settable data items is set, during the executing, by changing text characters in the code.
A method as claimed in any of claims 1 to 6, wherein the at least one of the settable data items is set, during the executing, by application programming interface calls.
Computing apparatus configured to perform the method of any of claims 1 to 8.
A data processing pipeline configured to: pass a data item from a data storage system or from a prequel transformer to a first transformer for first transformation to provide first transformed data; store the first transformed data in a temporary memory; write the first transformed data to the data storage system; and pass the first transformed data from the temporary memory to a second transformer for second transformation to provide second transformed data.
A data processing pipeline as claimed in claim 10, configured to display the first transformed data to the user before completing writing it to the data storage system.
A data processing pipeline as claimed in claim 10 or claim 11, configured to control the temporary memory to function as a write-back cache and/or configured to write created data objects stored in the temporary memory to the data storage system at regular intervals.
A data processing pipeline as claimed in any of claims 10 to 12, wherein the temporary memory comprises random access memory and non-volatile solid state memory.
A method of operating a data processing pipeline, the method comprising: passing a data item from a data storage system or from a prequel transformer to a first transformer for first transformation to provide first transformed data; storing the first transformed data in a temporary memory; writing the first transformed data to the data storage system; and passing the first transformed data from the temporary memory to a second transformer for second transformation to provide second transformed data, optionally comprising displaying the first transformed data to the user before completing writing it to the data storage system.
A computer program comprising machine-readable instructions which, when executed by computing apparatus, cause it to perform the method of any of claims 1 to 8 or 14.

Description

TECHNICAL FIELD The subject innovations relate to executing code and to a data processing pipeline. BACKGROUND Computers are very powerful tools for processing data. A computerized data pipeline is a useful mechanism for processing large amounts of data. A typical data pipeline is an ad-hoc collection of computer software scripts and programs for processing data extracted from "data sources" and for providing the processed data to "data sinks". Between the data sources and the data sinks, a data pipeline system is typically provided as a software platform to automate the movement and transformation of data from data sources to data sinks. In essence, the data pipeline system shields the data sinks from having to interface with the data sources or even being configured to process data in the particular formats provided by the data sources. Typically, data from the data sources received by the data sinks is processed by the data pipeline system in some way. For example, a data sink may receive data from the data pipeline system that is a combination (e.g., a join) of data of from multiple data sources, all without the data sink being configured to process the individual constituent data formats. One purpose of a data pipeline system is to execute data transformation steps on data obtained from data sources to provide the data in formats expected by the data sinks. A data transformation step may be defined as a set of computer commands or instructions (e.g., a database query) which, when executed by the data pipeline system, transforms one or more input datasets to produce one or more output or "target" datasets. Data that passes through the data pipeline system may undergo multiple data transformation steps. Such a step can have dependencies on the step or steps that precede it. BRIEF DESCRIPTION OF THE DRAWINGS The features of the subject innovations are set forth in the appended claims. However, for purpose of explanation, several aspects of the disclosed subject matter are set forth in the following figures. Figure 1 is a block diagram illustrating an example of a computer system configured to develop and run a data processing pipeline, in accordance with example embodiments;Figure 2 is a flow diagram illustrating an example method by which templates are generated and the associated code executed using the template, in accordance with example embodiments;Figure 3 is a representative drawing, illustrating an example graphical user interface configured to generate templates, in accordance with example embodiments;Figure 4 is a representative drawing, illustrating an example graphical user interface of a generated template configured to receive values for the display settable data items, in accordance with example embodiments;Figure 5 is a representative drawing, illustrating an example graphical user interface of a data pipeline development environment, in accordance with example embodiments;Figure 6 is a schematic diagram, illustrating the interactions between a data processing pipeline and the storage devices of an example computer system, in accordance with example embodiments; andFigure 7 is a schematic diagram of a computing device in which software-implemented processes of the subject innovations may be embodied. DETAILED DESCRIPTION The detailed description set forth below is intended as a description of various configurations of the subject innovations and is not intended to represent the only configurations in which the subject innovations may be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a thorough understanding of the subject innovations. However, the subject innovations are not limited to the specific details set forth herein and may be practiced without these specific details. In some instances, some structures and components are shown in block diagram form in order to avoid obscuring the concepts of the subject innovations. General Overview For ease of explanation, the subject innovations are largely described in the context of a data pipeline system. It should, however, be recognized that some aspects of these innovations are applicable in other contexts. Examples of such contexts include, but are not limited to, software development environments. As noted above, a typical data pipeline system is an ad-hoc collection of computer software scripts and programs for processing data extracted from "data sources" and for providing the processed data to "data sinks". Managing and developing such an ad-hoc collection may, however, be technically difficult, particularly when there are multiple transformation steps with later steps having dependencies on preceding steps. It should be further noted that these difficulties in management and development are likely to lead to unstable systems that do not fulfil their desired purpose. Similarly, they make it very di