US-12619607-B2 - Click-to-script reflection
Abstract
A click-to-script service enables developers of big-data job scripts to quickly see the underlying script operations from optimized execution plans. Once a big-data job is received, the disclosed examples compile it and generate tokens that are associated with each operation of the big-data job. These tokens include may include the file name of the job, the line number of the operation, and/or an Abstract Syntax Tree (AST) node for the given operations. An original execution plan is optimized into an optimized execution plan, and the tokens for the original operations of the job script are assigned to the optimized operations of the optimized execution plan. The optimized execution plan is graphically displayed in an interactive manner such that users may view the optimized execution plan and click on its optimized operations to find the original operations of the job script.
Inventors
- Xiangnan LI
- Marc Todd FRIEDMAN
- Wangchao Le
- Evgueni Zabokritski
Assignees
- MICROSOFT TECHNOLOGY LICENSING, LLC
Dates
- Publication Date
- 20260505
- Application Date
- 20241104
Claims (20)
- 1 . A computerized method comprising: generating a token for a job script operation of a job, the token comprising: a file name of a job script, and a position of an Abstract Syntax Tree (AST) node for an original execution plan of the job; assigning the generated token with a different operation of an optimized execution plan of the job script; and providing a graphical representation of the different operation as a leaf of an AST of the optimized execution plan in a user interface, wherein the graphical representation allows a user to navigate from the leaf representing the different operation to the job script operation.
- 2 . The method of claim 1 , further comprising: wherein interdependencies of the leaf are visually shown in the graphical representation.
- 3 . The method of claim 1 , further comprising: visualizing a job execution graph of the optimized execution plan.
- 4 . The method of claim 1 , further comprising: generating an equivalent operator tree to optimize the original execution plan into the optimized execution plan.
- 5 . The method of claim 4 , wherein generating the equivalent operator tree further comprises: copying an original query of the job script into an internal memo structure; and initiating a task to optimize a class corresponding to a root node of a query tree of the original execution plan.
- 6 . The method of claim 5 , wherein generating the equivalent operator tree further comprises: in response to initiating the task to optimize the class corresponding to the root node, initiating a task to optimize a subtree of the query tree.
- 7 . The method of claim 4 , wherein generating the equivalent operator tree further comprises: performing an optimization task according to a Volcano technique.
- 8 . The method of claim 7 , wherein generating the equivalent operator tree further comprises: checking whether an optimization goal has already been pursued prior to performing the optimization task.
- 9 . The method of claim 4 , wherein generating the equivalent operator tree further comprises: performing an optimization task according to a Cascades technique.
- 10 . The method of claim 9 , wherein generating the equivalent operator tree further comprises: excluding an optimization rule based on taking a transitive closure of a reachability relationship of operators mapped to each other in a single rule application.
- 11 . A computerized method comprising: applying a token to a job script operation of a job script for a job, wherein the token comprises a position of an Abstract Syntax Tree (AST) node for an original execution plan of the job; optimizing the original execution plan to create an optimized execution plan of the job, wherein the optimized execution plan comprises a different operation than the job script operation; associating the token applied to the job script operation with the different operation of the optimized execution plan; and providing a graphical representation of the different operation as a leaf of an AST of the optimized execution plan in a user interface, wherein the graphical representation allows a user to navigate from the leaf representing the different operation to the job script operation.
- 12 . The method of claim 11 , further comprising: optimizing the original execution plan into the optimized execution plan through a Volcano search strategy.
- 13 . The method of claim 12 , further comprising: checking whether an optimization goal has already been pursued prior to optimizing the original execution plan into the optimized execution plan.
- 14 . The method of claim 11 , further comprising: optimizing the original execution plan into the optimized execution plan through a Cascades optimizer.
- 15 . The method of claim 14 , further comprising: excluding an optimization rule based on taking a transitive closure of a reachability relationship of operators mapped to each other in a single rule application.
- 16 . A system comprising: a memory embodied with instructions to add a token to a job script operation of a job and optimize the job script operations with the added token into an optimized execution plan; and a processor configured to: apply a token to a job script operation, wherein the token comprises a position of an Abstract Syntax Tree (AST) node for an original execution plan of the job; optimize the original execution plan into the optimized execution plan of the job, wherein the optimized execution plan comprises a different operation than the job script operation; associate the token applied to the job script operation with the different operation of the optimized execution plan; and provide a graphical representation of the different operation as a leaf of an AST of the optimized execution plan to a user interface of a client computing device, wherein the graphical representation of the different operation is interactive and links from the leaf representing the different operation to the job script operation.
- 17 . The system of claim 16 , wherein the processor is further configured to: optimize the original execution plan into the optimized execution plan through a Volcano search strategy.
- 18 . The system of claim 17 , wherein the processor is further configured to: check whether an optimization goal has already been pursued prior to optimizing the original execution plan into the optimized execution plan.
- 19 . The system of claim 16 , wherein the processor is further configured to: optimize the original execution plan into the optimized execution plan through a Cascades optimizer.
- 20 . The system of claim 19 , wherein the processor is further configured to: exclude an optimization rule based on taking a transitive closure of a reachability relationship of operators mapped to each other in a single rule application.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of and claims priority to U.S. patent application Ser. No. 18/004,447, entitled “CLICK-TO-SCRIPT REFLECTION,” filed on Jan. 5, 2023, which claims priority to and is a '371 application of International Application No. PCT/CN2021/102553, entitled “CLICK-TO-SCRIPT REFLECTION,” filed on Jun. 25, 2021, the disclosures of which are incorporated herein by reference in their entireties. BACKGROUND Structured Computations Optimized for Parallel Execution (SCOPE) is a high-level, declarative query language for big data. Like the common structured query language (SQL), SCOPE allows users to specify big-data jobs that interact with databases. Execution of a SCOPE job internally involves an execution plan being generated by SCOPE's query optimizer for the job script. For each job script, the optimizer generates a set of operations representing the steps for SCOPE runtime and job scheduler to produce the required results, which is known as a job plan. During cost-based optimization, a SCOPE optimizer transforms a job script into an equivalent but more efficient query by a sequence of substitutions, explorations and transformations. Despite their equivalency, the resulting plan may look quite different from the user script, e.g., after replicating and applying filters earlier and after removing redundancy in the script. Not surprisingly, these optimizations lead to challenges for users to comprehend, troubleshoot and hence interactively improve their SCOPE job scripts, leading to confusion and suboptimal user experiences. From the platform side, SCOPE developers spend a substantial amount of time investigating question such as why SCOPE jobs are slow, why stages for the SCOP jobs are running for hours, etc. Most of the time, such issues may be addressed by editing and improving the scripts of the SCOPE job. For example, selected statements may be made more defensive to changes in data distributions or identify and rework a poorly implemented user defined function. A common step in troubleshooting is to find the blaming statements in a particular script that are responsible for a given issue. However, the transformations in the optimizer make it unintuitive to reverse-engineer from a SCOPE runtime stage to a script line. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Aspects described herein are generally directed to a cloud-based service that enables developers of big-data job scripts to quickly see the underlying script operations from optimized execution plans. Once a big-data job is received, the disclosed examples compile it and generate tokens that are associated with each operation—or, at least, each operation—of the big-data job. These tokens include may include the file name of the job, the line number of the operation, and/or an Abstract Syntax Tree (AST) node for the given operations. An original execution plan for the job is created from the original job script. The original execution plan is optimized into an optimized execution plan. The tokens for the original operations of the job script are assigned to the optimized operations of the optimized execution plan. The optimized execution plan is graphically displayed to the developer such that the optimized operations of the optimized execution plan may be clicked, bringing the developer back to the original operations of the job script. In short, the job script is compiled, tokenized, and optimized, and its optimized version is displayed so that a user may click back to the original script operations. BRIEF DESCRIPTION OF THE DRAWINGS The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein: FIG. 1 is a block diagram illustrating an example computing device for implementing various examples of the present disclosure; FIG. 2 illustrates a block diagram of a networking environment suitable for implementing a cloud service that tracks job script operations through compiling and optimization and allows a developer to quickly locate specific script operations using the disclosed implementations and examples; FIG. 3 illustrates a block diagram of one example of the optimization rules implemented by the optimization engine; FIG. 4 illustrates a flow diagram of a networking environment suitable for implementing a cloud service that tracks job script operations through compiling and optimization and allows a developer to quickly locate specific script operations using the disclosed implementations and examples; FIGS. 5A-5C illustrate a user interface (UI) diagram depicting a big-data job script in a UI of a c