US-12625728-B2 - Multiple granularity data flow analysis in mainframe applications
Abstract
Data flow analysis is provided. A program level data flow analysis is performed for each respective data flow path in a set of control flow chains corresponding to each respective program called by a particular job step in each respective job of a plurality of jobs in a sequence of job execution corresponding to an application. A particular field of a plurality of fields is identified in a record of each dataset of a plurality of datasets corresponding to the application that is included in a particular data flow path to form a field level data flow analysis for each particular data flow path. Results of the field level data flow analysis for each particular data flow path in the set of control flow chains corresponding to each respective program called by the particular job step in each respective job of the plurality of jobs is aggregated.
Inventors
- Atul Kumar
- Vitobha Munigala
- Alex Mathai
- Amith Singhee
- Rahamim Katan
- Keerthi Narayan RAGHUNATH
Assignees
- INTERNATIONAL BUSINESS MACHINES CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20231012
Claims (16)
- 1 . A method executed by a computer, the method comprising: receiving an input to perform a field level data flow analysis of a mainframe application, wherein the field level data flow analysis includes a program level data flow analysis across lines of program code corresponding to the mainframe application, a job level data flow analysis across a plurality of programs corresponding to the mainframe application, and a job scheduler level data flow analysis across a plurality of jobs corresponding to the mainframe application; aggregating results of the field level data flow analysis for each particular data flow path in a plurality of data flow paths in a set of control flow chains corresponding to each respective program of the plurality of programs called by a particular job step of a plurality of job steps in each respective job of the plurality of jobs in a sequence of job execution corresponding to the mainframe application comprised of the plurality of programs to form aggregated results of the field level data flow analysis, wherein: the program level data flow analysis is based on each respective data flow path of the plurality of data flow paths corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution corresponding to the mainframe application comprised of the plurality of programs, each particular field of a plurality of fields in a record of each dataset of a plurality of datasets corresponds to the mainframe application that is included in a particular data flow path of the plurality of data flow paths forming the field level data flow analysis, the field level data flow analysis is based on all possible data flow paths between a combination of different database types and certain fields of records in the plurality of datasets used by the plurality of programs of the mainframe application at a program level, a job level, and a job scheduling level for each particular data flow path in response to the program level data flow analysis, a respective verification of each particular data flow path of the plurality of data flow paths is based on data flow between the certain fields of the records in the plurality of datasets used by the plurality of programs of the mainframe application to establish field level dependency between each respective pair of datasets in the plurality of datasets, and a unique layout of the records among a plurality of layouts influences a particular data flow path and byte positions of data in the unique layout, the plurality of layouts being based on a program variable flattening operation and an application knowledge graph of the mainframe application; and optimizing the mainframe application based on the aggregated results of the field level data flow analysis for each particular data flow path in the plurality of data flow paths in the set of control flow chains corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution corresponding to the mainframe application comprised of the plurality of programs.
- 2 . The method of claim 1 , further comprising: outputting the aggregated results of the field level data flow analysis for each particular data flow path in the plurality of data flow paths in the set of control flow chains corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution corresponding to the mainframe application comprised of the plurality of programs to a user via a client device.
- 3 . The method of claim 1 , further comprising: retrieving source code of the mainframe application that includes job control language code corresponding to the plurality of jobs.
- 4 . The method of claim 3 , further comprising: generating the application knowledge graph of the mainframe application based on entities corresponding to the mainframe application, relationships between the entities, and record layout of each respective dataset of the plurality of datasets corresponding to the mainframe application, wherein; the entities corresponding to the mainframe application and the relationships between the entities are based on a static code analysis of the source code of the mainframe application, the entities including the plurality of programs, the plurality of jobs, and the plurality of datasets corresponding to the mainframe application, and the record layout of each respective dataset of the plurality of datasets corresponding to the mainframe application includes a sequence of the plurality of fields in a corresponding record of a particular dataset.
- 5 . The method of claim 1 , wherein the sequence of job execution of the plurality of jobs corresponding to the mainframe application is based on information contained in the application knowledge graph of the mainframe application in response to the job scheduler level data flow analysis across the plurality of jobs.
- 6 . The method of claim 1 , wherein: the job level data flow analysis for each respective job of the plurality of jobs corresponding to the mainframe application in the sequence of job execution is based on the sequence of job execution of the plurality of jobs, and the set of control flow chains and the plurality of data flow paths in the set of control flow chains correspond to each respective job of the plurality of jobs in the sequence of job execution.
- 7 . The method of claim 1 , further comprising: aggregating the results of the field level data flow analysis of each respective data flow path for each particular online transaction of a plurality of online transactions corresponding to the mainframe application to form aggregated results of the field level data flow analysis of the plurality of online transactions, wherein: the set of control flow chains and the plurality of data flow paths in the set of control flow chains correspond to each respective online transaction of the plurality of online transactions in response to receiving an input to perform the field level data flow analysis of the plurality of online transactions from a user via a client device, and each particular field of the plurality of fields in the record of each dataset of the plurality of datasets or each particular field of a plurality of fields in a database table that is included in a respective data flow path of the plurality of data flow paths forms the field level data flow analysis of each respective data flow path for each particular online transaction of the plurality of online transactions; and outputting the aggregated results of the field level data flow analysis of the plurality of online transactions to the user via the client device.
- 8 . A computer system comprising: a communication fabric; a storage device connected to the communication fabric, wherein the storage device stores program instructions; and a processor connected to the communication fabric, wherein the processor executes the program instructions to: receive an input to perform a field level data flow analysis of a mainframe application, wherein the field level data flow analysis includes a program level data flow analysis across lines of program code corresponding to the mainframe application, a job level data flow analysis across a plurality of programs corresponding to the mainframe application, and a job scheduler level data flow analysis across a plurality of jobs corresponding to the mainframe application; aggregate results of the field level data flow analysis for each particular data flow path in a plurality of data flow paths in a set of control flow chains corresponding to each respective program of the plurality of programs called by a particular job step of a plurality of job steps in each respective job of the plurality of jobs in a sequence of job execution corresponding to the mainframe application comprised of the plurality of programs to form aggregated results of the field level data flow analysis, wherein: the program level data flow analysis is based on each respective data flow path of the plurality of data flow paths corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution corresponding to the mainframe application comprised of the plurality of programs, and wherein each particular field of a plurality of fields is identified in a record of each dataset of a plurality of datasets corresponds to the mainframe application that is included in a particular data flow path of the plurality of data flow paths forming the field level data flow analysis, the field level data flow analysis is based on all possible data flow paths between a combination of different database types and certain fields of records in the plurality of datasets used by the plurality of programs of the mainframe application at a program level, a job level, and a job scheduling level for each particular data flow path in response to the program level data flow analysis, a respective verification of each particular data flow path of the plurality of data flow paths is based on data flow between the certain fields of the records in the plurality of datasets used by the plurality of programs of the mainframe application to establish field level dependency between each respective pair of datasets in the plurality of datasets, and a unique layout of the records among a plurality of layouts influences a particular data flow path and byte positions of data in the unique layout, the plurality of layouts being based on a program variable flattening operation and an application knowledge graph of the mainframe application; and optimize the mainframe application based on the aggregated results of the field level data flow analysis for each particular data flow path in the plurality of data flow paths in the set of control flow chains corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution corresponding to the mainframe application comprised of the plurality of programs.
- 9 . The computer system of claim 8 , wherein the processor further executes the program instructions to: output the aggregated results of the field level data flow analysis for each particular data flow path in the plurality of data flow paths in the set of control flow chains corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution corresponding to the mainframe application comprised of the plurality of programs to a user via a client device.
- 10 . The computer system of claim 8 , wherein the processor further executes the program instructions to: retrieve source code of the mainframe application that includes job control language code corresponding to the plurality of jobs.
- 11 . A computer program product product comprising a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to: receive an input to perform a field level data flow analysis of a mainframe application, wherein the field level data flow analysis includes a program level data flow analysis across lines of program code corresponding to the mainframe application, a job level data flow analysis across a plurality of programs corresponding to the mainframe application, and a job scheduler level data flow analysis across a plurality of jobs corresponding to the mainframe application; aggregate results of the field level data flow analysis for each particular data flow path in a plurality of data flow paths in a set of control flow chains corresponding to each respective program of the plurality of programs called by a particular job step of a plurality of job steps in each respective job of the plurality of jobs in a sequence of job execution corresponding to the mainframe application comprised of the plurality of programs to form aggregated results of the field level data flow analysis, wherein; the program level data flow analysis is based on each respective data flow path of the plurality of data flow paths corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution corresponding to the mainframe application comprised of the plurality of programs, each particular field of a plurality of fields is identified in a record of each dataset of a plurality of datasets corresponds to the mainframe application that is included in a particular data flow path of the plurality of data flow paths forming the field level data flow analysis, the field level data flow analysis is based on all possible data flow paths between a combination of different database types and certain fields of records in the plurality of datasets used by the plurality of programs of the mainframe application at a program level, a job level, and a job scheduling level for each particular data flow path in response to the program level data flow analysis, a respective verification of each particular data flow path of the plurality of data flow paths is based on data flow between the certain fields of the records in the plurality of datasets used by the plurality of programs of the mainframe application to establish field level dependency between each respective pair of datasets in the plurality of datasets, and a unique layout of the records among a plurality of layouts influences a particular data flow path and byte positions of data in the unique layout, the plurality of layouts being based on a program variable flattening operation and an application knowledge graph of the mainframe application; and optimize the mainframe application based on the aggregated results of the field level data flow analysis for each particular data flow path in the plurality of data flow paths in the set of control flow chains corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution corresponding to the mainframe application comprised of the plurality of programs.
- 12 . The computer program product of claim 11 , wherein the program instructions further cause the computer to: output the aggregated results of the field level data flow analysis for each particular data flow path in the plurality of data flow paths in the set of control flow chains corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution corresponding to the mainframe application comprised of the plurality of programs to a user via a client device.
- 13 . The computer program product of claim 11 , wherein the program instructions further cause the computer to: retrieve source code of the mainframe application that includes job control language code corresponding to the plurality of jobs.
- 14 . The computer program product of claim 13 , wherein the program instructions further cause the computer to: generate the application knowledge graph of the mainframe application based on entities corresponding to the mainframe application, relationships between the entities, and record layout of each respective dataset of the plurality of datasets corresponding to the mainframe application, wherein: the entities corresponding to the mainframe application and the relationships between the entities are based on a static code analysis of the source code of the mainframe application, the entities including the plurality of programs, the plurality of jobs, and the plurality of datasets corresponding to the mainframe application, and the record layout of each respective dataset of the plurality of datasets corresponding to the mainframe application includes a sequence of the plurality of fields in a corresponding record of a particular dataset.
- 15 . The computer program product of claim 11 , wherein the job scheduler level data flow analysis is performed across the plurality of jobs, and wherein the sequence of job execution of the plurality of jobs corresponding to the mainframe application is based on information contained in the application knowledge graph of the mainframe application in response to the job scheduler level data flow analysis across the plurality of jobs.
- 16 . The computer program product of claim 11 , wherein: job level data flow analysis for each respective job of the plurality of jobs corresponding to the mainframe application in the sequence of job execution is based on the sequence of job execution of the plurality of jobs, and the set of control flow chains and the plurality of data flow paths in the set of control flow chains correspond to each respective job of the plurality of jobs in the sequence of job execution.
Description
BACKGROUND The disclosure relates generally to comprehensive data flow analysis and more specifically to data flow analysis of mainframe applications. Data flow analysis is the process of collecting information regarding the way data flows or moves through an application or program. Data flow analysis attempts to obtain particular information at each point in the application. Basically, data flow analysis models the application or program as a knowledge graph, where nodes in the graph represent program entities and edges represent relationships (e.g., data flow dependencies) between the program entities. Data flow information is then propagated through the knowledge graph. SUMMARY According to one illustrative embodiment, a computer-implemented method for data flow analysis is provided. A computer performs a program level data flow analysis for each respective data flow path of a plurality of data flow paths in a set of control flow chains corresponding to each respective program of a plurality of programs called by a particular job step of a plurality of job steps in each respective job of a plurality of jobs in a sequence of job execution corresponding to an application. The computer identifies a particular field of a plurality of fields in a record of each dataset of a plurality of datasets corresponding to the application that is included in a particular data flow path of the plurality of data flow paths to form a field level data flow analysis for each particular data flow path in response to performing the program level data flow analysis. The computer aggregates results of the field level data flow analysis for each particular data flow path in the plurality of data flow paths in the set of control flow chains corresponding to each respective program of the plurality of programs called by the particular job step of the plurality of job steps in each respective job of the plurality of jobs in the sequence of job execution to form aggregated results of the field level data flow analysis. According to other illustrative embodiments, a computer system and computer program product for data flow analysis are provided. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a pictorial representation of a computing environment in which illustrative embodiments may be implemented; FIG. 2 is a diagram illustrating an example of a data flow analysis system in accordance with an illustrative embodiment; FIG. 3 is a diagram illustrating an example of a dataset record layout in accordance with an illustrative embodiment; FIG. 4 is a diagram illustrating an example of a dataset record layout identification process in accordance with an illustrative embodiment; FIG. 5 is a diagram illustrating an example of a data flow paths in a program identification process in accordance with an illustrative embodiment; FIG. 6 is a diagram illustrating an example of a control flow chains and data flow paths in a job identification process in accordance with an illustrative embodiment; FIG. 7 is a diagram illustrating an example of a data flow paths in a job identification process in accordance with an illustrative embodiment; FIG. 8 is a diagram illustrating an example of a data flow path between jobs identification process in accordance with an illustrative embodiment; FIG. 9 is a diagram illustrating an example of data flow paths across online transactions and batch jobs in accordance with an illustrative embodiment; FIGS. 10A-10C are a flowchart illustrating a process for data flow analysis in accordance with an illustrative embodiment; and FIG. 11 is a flowchart illustrating a process for online transaction data flow analysis in accordance with an illustrative embodiment. DETAILED DESCRIPTION Various aspects of the present disclosure are described by narrative text, flowcharts, block diagrams of computer systems and/or block diagrams of the machine logic included in computer program product (CPP) embodiments. With respect to any flowcharts, depending upon the technology involved, the operations can be performed in a different order than what is shown in a given flowchart. For example, again depending upon the technology involved, two operations shown in successive flowchart blocks may be performed in reverse order, as a single integrated step, concurrently, or in a manner at least partially overlapping in time. A computer program product embodiment (“CPP embodiment” or “CPP”) is a term used in the present disclosure to describe any set of one, or more, storage media (also called “mediums”) collectively included in a set of one, or more, storage devices that collectively include machine readable code corresponding to instructions and/or data for performing computer operations specified in a given CPP claim. A “storage device” is any tangible device that can retain and store instructions for use by a computer processor. Without limitation, the computer readable storage medium may be an electronic storage medium, a magnetic storage medium,