US-12619615-B1 - Data stream processing instruction set previews using batch conversion
Abstract
Systems and methods are described for providing previews of deployment of data stream processing instructions sets, sometimes called pipelines, to a stream data processing system. Rather than deploying such an instruction set, which may cause detrimental side effects, previews can be facilitated by conversion of a data stream processing instructions set to a batch query that is applied to an existing data set. An output of the batch query can then be provided to an end user as a preview of output of the data stream processing instructions set, when implemented.
Inventors
- Ankur Dalsukhbhai Bambharoliya
- Ricky Burnett
- Daniel FERSTAY
- Arthur Foelsche
- Alexander D. James
- Ganesh Jothikumar
- Bei Li
- Amy Joanna Sutedja
- Salih Ammar Wajih Zainulabdeen
Assignees
- CISCO TECHNOLOGY, INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20240403
Claims (20)
- 1 . A computer-implemented method comprising: accessing a pipeline, the pipeline comprising an instruction set indicating a data source and one or more data manipulations for processing data from a streaming data input to generate a streaming output; executing the pipeline to generate a preview by at least: converting the pipeline, comprising the instruction set, into a batch query at least by generating the batch query to process data from an existing data set in accordance with the one or more manipulations for processing included in the pipeline, wherein the existing data set is distinct from the streaming data input; and applying the batch query, representing conversion of the pipeline, to the existing data set to generate a query results set; and outputting the query results set generated from applying the batch query, representing the conversion of the pipeline, to the existing data set as the preview of the results of the pipeline when applied to the streaming data input.
- 2 . The computer-implemented method of claim 1 , further comprising accessing a request to preview results of the pipeline.
- 3 . The computer-implemented method of claim 2 , wherein the streaming output is designated by a streaming output variable, and wherein accessing the request to preview results of the pipeline comprises accessing a request to deploy the pipeline with the streaming output variable unbound.
- 4 . The computer-implemented method of claim 2 , wherein the request to preview results of the pipeline specifies the existing data set.
- 5 . The computer-implemented method of claim 2 further comprising, when the request to preview results of the pipeline does not specify the existing data set, selecting the existing data set using an association between the existing data set and the streaming data input.
- 6 . The computer-implemented method of claim 1 further comprising, selecting the existing data set using an association between the existing data set and the streaming data input.
- 7 . The computer-implemented method of claim 1 , wherein the existing data set comprises data previously read from the streaming data input.
- 8 . The computer-implemented method of claim 1 , wherein the pipeline and the batch query are specified in a shared query language.
- 9 . The computer-implemented method of claim 1 , wherein the pipeline and the batch query are specified in a shared query language, wherein a final command of the pipeline specifies the streaming output, and wherein converting the pipeline into the batch query comprises removing the final command of the pipeline and saving a result as the batch query.
- 10 . The computer-implemented method of claim 1 , wherein the pipeline and the batch query are specified in a common query language, and wherein converting the pipeline into the batch query comprises replacing a reference to the streaming data input in the pipeline with a reference to the existing data set.
- 11 . The computer-implemented method of claim 1 , wherein the pipeline further writes data to a second streaming output, wherein converting the pipeline into the batch query comprises converting the pipeline into at least two batch queries comprising a first batch query corresponding to the streaming output and a second batch query corresponding to the second streaming output, and wherein outputting the query results set as preview results of the pipeline comprises outputting query results corresponding to a combination of results from the first batch query and results from the second batch query.
- 12 . The computer-implemented method of claim 1 further comprising: obtaining specification of a second pipeline, the second pipeline comprising a specification to write data to the streaming output; and obtaining a request to preview results of the second pipeline; wherein converting the pipeline into the batch query comprises converting both the pipeline and the second data pipeline into a single batch query; and wherein outputting the query results set as preview results of the pipeline comprises outputting the query results as preview results of both the pipeline and the second pipeline.
- 13 . The computer-implemented method of claim 1 , wherein the method is implemented without reading data from the streaming data input.
- 14 . The computer-implemented method of claim 1 , wherein the method is implemented without writing data to the streaming data output.
- 15 . The computer-implemented method of claim 1 , wherein the streaming output is identified in the pipeline with an output identifier, and wherein applying the batch query to the existing data set to generate the query results set comprises associating the query results set with the output identifier.
- 16 . The computer-implemented method of claim 15 , wherein outputting the query results set as preview results of the pipeline is responsive to a request to read from the output identifier.
- 17 . A system comprising: a processor; and a non-transitory computer-readable medium having stored thereon instructions that, when executed by the processor, cause the processor to: access a pipeline, the pipeline comprising an instruction set indicating a data source and one or more data manipulations for processing data from a streaming data input to generate a streaming output; execute the pipeline to generate a preview by at least: converting the pipeline, comprising the instruction set, into a batch query at least by generating the batch query to process data from an existing data set in accordance with the one or more manipulations for processing included in the pipeline, wherein the existing data set is distinct from the streaming data input; and applying the batch query, representing conversion of the pipeline, to the existing data set to generate a query results set; and output the query results set generated from applying the batch query, representing conversion of the pipeline, to the existing data set as the preview of the results of the pipeline when applied to the streaming data input.
- 18 . The system of claim 17 , wherein the pipeline and the batch query are specified in a common query language, and wherein to convert the pipeline into the batch query, the instructions cause the processor to replace a reference to the streaming data input in the pipeline with a reference to the existing data set.
- 19 . One or more non-transitory computer-readable media having stored thereon instructions that, when executed by a computing system including one or more processors, cause the computing system to: access a pipeline, the pipeline comprising an instruction set indicating a data source and one or more data manipulations for processing data from a streaming data input to generate a streaming output; execute the pipeline to generate a preview by at least: converting the pipeline, comprising the instruction set, into a batch query at least by generating the batch query to process data from an existing data set in accordance with the one or more manipulations for processing included in the pipeline, wherein the existing data set is distinct from the streaming data input; and applying the batch query, representing conversion of the pipeline, to the existing data set to generate a query results set; and output the query results set generated from applying the batch query, representing conversion of the pipeline, to the existing data set as the preview of the results of the pipeline when applied to the streaming data input.
- 20 . The one or more non-transitory computer-readable claim 19 , wherein the pipeline and the batch query are specified in a common query language, and wherein to convert the pipeline into the batch query, the instructions cause the computing system to replace a reference to the streaming data input in the pipeline with a reference to the existing data set.
Description
RELATED APPLICATIONS Any and all applications for which a foreign or domestic priority claim is identified in the Application Data Sheet as filed with the present application are incorporated by reference under 37 CFR 1.57 and made a part of this specification. BACKGROUND Computing devices can utilize communication networks to exchange data. Companies and organizations operate computer networks that interconnect a number of computing devices to support operations or to provide services to third parties. The computing systems can be located in a single geographic location or located in multiple, distinct geographic locations (e.g., interconnected via private or public communication networks). During operation, computing devices can generate large amounts of data, such as logs, metrics, and the like. It can be difficult to efficiently and effectively process and search such data, particularly as the amount of data grows large. BRIEF DESCRIPTION OF THE DRAWINGS Illustrative examples are described in detail below with reference to the following figures: FIG. 1 depicts an illustrative operating environment in which previews of data stream processing instruction sets are provided by conversion of the instruction sets to batch queries; FIG. 2 depicts illustrative conversions of data stream processing instruction sets into batch queries to facilitate previews of data stream processing instruction sets; FIG. 3 depicts example interactions for providing a preview of the output of a data stream processing instruction set during operation based on conversion of the data stream processing instruction set into a batch query and execution of the batch query; FIG. 4 depicts example interactions for deploying a data stream processing instruction set to a stream data processing system; FIG. 5 is a flow chart depicting an illustrative routine for providing previews results for a data stream processing instruction set by conversion of the data stream processing instruction set into a batch query; FIG. 6 is a block diagram illustrating an example computing environment that includes a data intake and query system; FIG. 7 is a block diagram illustrating in greater detail an example of an indexing system of a data intake and query system; FIG. 8 is a block diagram illustrating in greater detail an example of the search system of a data intake and query system; FIG. 9 illustrates an example of a self-managed network that includes a data intake and query system; and FIG. 10 depicts and implementation of a data intake and query system including a stream data processing system. DETAILED DESCRIPTION Generally described, aspects of the present disclosure relate to providing previews for processing occurring via a data stream processing instruction set, sometimes referred to as a pipeline. More specifically, aspects of the present disclosure relate to providing previews for data stream processing instruction sets by converting such instruction sets to batch queries that can be applied to an existing data set. As disclosed herein, an end user may author a data stream processing instruction set to be deployed to a stream data processing system. A data stream processing instruction set can illustratively configure the stream data processing system to obtain data from an input data stream (also referred to as a “source” stream), manipulate the data according to one or more manipulations (e.g., filtering the data, transforming the data, routing the data, or the like), and write the data to an output data stream (also referred to as a “sink” stream). Prior to deploying the data stream processing instruction set, the user may desire to preview operation of the data stream processing instruction set, such as to ensure the data stream processing instruction set operates as desired. In accordance with embodiments of the present disclosure, such preview can be achieved without requiring reading from a source stream or writing to a sink stream, and thus without negatively impacting such streams. Specifically, such preview can be achieved by converting the data stream processing instruction set into a batch query and applying the batch query to an existing data set representative of data within the source stream, enabling an end user to view results of the data stream processing instruction set when implemented and thus preview operation of the data stream processing instruction set. One potential mechanism for previewing operation of a data stream processing instruction set is to simply deploy the data stream processing instruction set with respect to a source stream. For example, a stream data processing system may read from a source stream, transform data from the source stream according to the data stream processing instruction set, and provide a user with output of the data stream processing instruction set as a preview of the data stream processing instruction set's operation. This approach can be problematic for a variety of reasons. For examp