CN-121986323-A - System and method for translating a first encoding language into a second encoding language
Abstract
A system and method for translating a first encoding language into a second encoding language trains a Machine Learning (ML) model on a first encoding language specific data set associated with the first encoding language, wherein the ML model is trained to translate one or more code sets of the first encoding language into corresponding one or more code sets of the second encoding language, generates various unit test cases by using the ML model, wherein the unit test cases run the one or more code sets of the second encoding language in parallel with the one or more code sets of the first encoding language, iteratively tests and refines the ML model until a maturity threshold is reached, and upon reaching the maturity threshold, containers the one or more code sets of the second encoding language into one or more applications.
Inventors
- LI ZHIXIU
Assignees
- 纽约梅隆银行
Dates
- Publication Date
- 20260505
- Application Date
- 20240731
- Priority Date
- 20230807
Claims (20)
- 1. A method for translating a first encoding language to a second encoding language, comprising: training by the processor a first machine learning ML model at least in part on a first encoding language specific data set associated with the first encoding language, Wherein the first ML model is trained to translate one or more code sets of the first encoding language into corresponding one or more code sets of the second encoding language; generating, by the processor, at least one unit test case using the first ML model, Wherein the at least one unit test case runs one or more code sets of the second encoding language in parallel with one or more code sets of the first encoding language; iteratively testing and refining, by the processor, the first ML model based at least in part on a maturity level of the first ML model until a maturity threshold is reached; And Upon reaching the maturity threshold, one or more sets of code of the second encoding language are containerized into an application by the processor.
- 2. The method of claim 1, wherein the first encoded language specific data set comprises one or more of at least one language reference document, library, historical input file, historical output file, runtime log, parameter set, or control point associated with the first encoded language.
- 3. The method of claim 1, wherein the first encoding language is a generic business oriented language COBOL.
- 4. The method of claim 1, wherein the second encoding language is one of Java, golang, python, angular or c++.
- 5. The method of claim 1, wherein the first machine learning model is a natural language model, NLM.
- 6. The method of claim 1, wherein iteratively testing the first ML model comprises: A plurality of iterative regression tests are performed based on historical input data of at least one of the one or more code sets of the first encoding language and corresponding output data of the first ML model is compared to historical output of the at least one of the one or more code sets of the first encoding language.
- 7. The method of claim 6, wherein iteratively refining the first ML model comprises: executing, by the processor, one or more debugging techniques, and The first ML model is updated based on one or more executed debugging techniques.
- 8. The method of claim 1, further comprising: dynamically scaling, by the processor, one or more containerized applications based at least in part on one or more of a second ML model or at least one second unit test case that has reached the maturity threshold.
- 9. The method of claim 1, further comprising: the progress of the at least one test case is tracked by the processor based at least in part on a maturity level of the first ML model.
- 10. A system for translating a first encoding language to a second encoding language, comprising: a computer having a processor and a memory; one or more sets of code stored in the memory and executed by the processor, wherein the one or more sets of code, when executed, configure the processor to: training a first machine learning ML model at least partially on a first encoding language specific data set associated with the first encoding language, Wherein the first ML model is trained to translate one or more code sets of the first encoding language into corresponding one or more code sets of the second encoding language; Generating at least one unit test case by using the first ML model, Wherein the at least one unit test case runs one or more code sets of the second encoding language in parallel with one or more code sets of the first encoding language; Iteratively testing and refining the first ML model until a maturity threshold is reached based at least in part on a maturity level of the first ML model; And Upon reaching the maturity threshold, one or more code sets of the second encoding language are containerized into an application.
- 11. The system of claim 10, wherein the first encoded language specific data set comprises one or more of at least one language reference document, library, historical input file, historical output file, runtime log, parameter set, or control point associated with the first encoded language.
- 12. The system of claim 10, wherein the first encoding language is a generic business oriented language COBOL.
- 13. The system of claim 10, wherein the second encoding language is one of Java, golang, python, angular or c++.
- 14. The system of claim 10, wherein the first ML model is a natural language model, NLM.
- 15. The system of claim 10, wherein when iteratively testing the first ML model, the processor is further configured to: A plurality of iterative regression tests are performed based on historical input data of at least one of the one or more code sets of the first encoding language and corresponding output data of the first ML model is compared to historical output of the at least one of the one or more code sets of the first encoding language.
- 16. The system of claim 15, wherein when iteratively refining the first ML model, the processor is further configured to: Executing one or more debugging techniques, and The first ML model is updated based on one or more executed debugging techniques.
- 17. The system of claim 10, wherein the processor is further configured to: one or more containerized applications are dynamically scaled based at least in part on one or more of a second ML model or at least one second unit test case that has reached the maturity threshold.
- 18. The system of claim 10, wherein the processor is further configured to: The progress of the at least one test case is tracked based at least in part on the maturity level of the first ML model.
- 19. A non-transitory computer-readable medium storing computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: training a first machine learning ML model at least partially on a first encoding language specific data set associated with a first encoding language, Wherein the first ML model is trained to translate one or more code sets of the first encoding language into corresponding one or more code sets of a second encoding language; Generating at least one unit test case by using the first ML model, Wherein the at least one unit test case runs one or more code sets of the second encoding language in parallel with one or more code sets of the first encoding language; iteratively testing and refining the first ML model until a maturity threshold is reached based at least in part on a maturity level of the first ML model, and Upon reaching the maturity threshold, one or more code sets of the second encoding language are containerized into an application.
- 20. The non-transitory computer-readable medium of claim 19, wherein the first encoding language is a generic business oriented language COBOL, and Wherein the second encoding language is one of Java, golang, python, angular or c++.
Description
System and method for translating a first encoding language into a second encoding language Cross Reference to Related Applications The present application claims priority from U.S. provisional application No.63/531,189 filed 8/7 at 2023, the entire contents of which are incorporated herein by reference. Background Mainframe systems have been in use in the financial industry for decades, and the general business oriented language (commonly referred to as COBOL) has been the primary programming language since the 60 s of the 20 th century. Thanks to decades of development and technological advances, both mainframes and COBOLs have become an integral part of business functions and provide efficient daytime and nighttime processing operations. However, with the advent of distributed systems, modern programming languages, and public cloud services, subject matter experts within both mainframe technology and COBOL are increasingly difficult to find, and core organization knowledge for existing implementations and business logic is also rapidly decreasing for many large organizations. Current mainframe modernization techniques typically involve manually analyzing COBOL codes and individually rewriting them into the modernized code language. This can be error prone and resource intensive in terms of subject matter expertise and time required to achieve acceptable transitions/results. While Machine Learning (ML) and Artificial Intelligence (AI) have evolved for decades, prior art systems have difficulty producing meaningful results and sufficient accuracy in interpreting programming languages, converting to different programming languages, and achieving processing peering. Thus, there is a need for systems and methods that can modernize COBOL (and other computer languages) applications and their business logic to modern programming languages in a programmatic manner and optimize processing efficiency so that large organizations can reduce risks associated with operating core business functions on legacy systems. Disclosure of Invention Aspects of the present disclosure relate to methods, apparatuses, and/or systems for translating a first encoding language to a second encoding language. In some aspects, the technology described herein relates to a method for translating a first encoding language to a second encoding language, including training, by a processor, a first Machine Learning (ML) model over a first encoding language specific dataset related to the first encoding language, wherein the first ML model is trained to translate one or more code sets of the first encoding language to corresponding one or more code sets of the second encoding language, generating, by the processor, at least one unit test case using the first ML model, wherein the at least one unit test case runs one or more code sets of the second encoding language in parallel with the one or more code sets of the first encoding language, iteratively testing and refining, by the processor, the first ML model based at least in part on a maturity level of the first ML model until a maturity threshold is reached, and containerizing, by the processor, the one or more code sets of the second encoding language into an application upon reaching the maturity threshold. In some aspects, the techniques described herein relate to a method in which a first encoding language specific data set includes one or more of at least one language reference document, library, history input file, history output file, runtime log, parameter set, or control point associated with the first encoding language. In some aspects, the technology described herein relates to a method wherein the first encoding language is a common business oriented language (COBOL). In some aspects, the technology described herein relates to a method wherein the second encoding language is one of Java, golang, python, angular or c++. In some aspects, the technology described herein relates to a method, wherein the first machine learning model is a Natural Language Model (NLM). In some aspects, the techniques described herein relate to a method in which iteratively testing a first ML model includes performing a plurality of iterative regression tests based on historical input data for at least one of one or more code sets of a first encoding language and comparing corresponding output data of the first ML model to historical output of the at least one of one or more code sets of the first encoding language. In some aspects, the techniques described herein relate to a method in which iteratively refining a first ML model includes executing, by a processor, one or more debugging techniques and updating the first ML model based on the one or more executed debugging techniques. In some aspects, the techniques described herein relate to a method further comprising dynamically scaling, by a processor, one or more containerized applications based at least in part on one or more of the second ML model or