US-12625794-B1 - Training and/or using machine learning model(s) for automatic generation of test case(s) for source code

US12625794B1US 12625794 B1US12625794 B1US 12625794B1US-12625794-B1

Abstract

Training and/or utilization of machine learning model(s) (e.g., neural network model(s)) in automatically generating test case(s) for source code. Techniques disclosed herein can be utilized in generating test case(s) for unit test testing (or other white-box testing) and/or for functional testing (or other black-box testing).

Inventors

Rishabh Singh
David Andre

Assignees

GOOGLE LLC

Dates

Publication Date: 20260512
Application Date: 20240126

Claims (19)

1 . A method implemented by one or more processors, the method comprising: identifying a source code unit that is programmed in a programming language; processing the source code unit, using a code-to-embedding machine learning model, to generate one or more code unit embeddings in a latent space; processing the one or more code unit embeddings, using a code embedding-to-test embedding machine learning model, to generate one or more predicted unit test embeddings in the latent space; using the one or more predicted unit test embeddings to identify a unit test for the source code unit; and evaluating the source code unit using at least the unit test, wherein evaluating the source code unit using at least the unit test comprises: executing the source code unit, using test inputs of the unit test, to generate source code unit output, and evaluating the source code unit based on the source code unit output.
2 . The method of claim 1 , wherein using the one or more predicted unit test embeddings to identify the unit test for the source code unit comprises: processing the one or more predicted unit test embeddings, using an embedding-to-code machine learning model, to generate the unit test for the source code unit.
3 . The method of claim 2 , wherein processing the one or more predicted unit test embeddings, using the embedding-to-code machine learning model, to generate the unit test for the source code unit comprises: generating a sequence of outputs based on processing the one or more predicted unit test embeddings using the embedding-to-code machine learning model, wherein the outputs of the sequence each comprise a corresponding probability distribution; and generating the unit test based on highest probabilities in the corresponding probability distributions of the sequence of outputs.
4 . The method of claim 1 , wherein using the one or more predicted unit test embeddings to identify the unit test for the source code unit comprises: determining that the one or more predicted unit test embeddings satisfy a distance threshold relative to one or more pre-generated unit test embeddings for the unit test, wherein the one or more pre-generated unit test embeddings are generated by processing the unit test using the code-to-embedding machine learning model; and responsive to determining that the one or more predicted unit test embeddings satisfy the distance threshold relative to the one or more pre-generated unit test embeddings for the unit test: identifying the unit test.
5 . The method of claim 1 , further comprising: determining, based on the evaluating, to deploy the source code unit in production.
6 . The method of claim 1 , further comprising: causing rendering, in a development application, of one or more metrics that are based on the evaluating.
7 . The method of claim 1 , wherein the source code unit is in a second programming language and is an automated translation of a corresponding source code unit in a first programming language that is different from the second programming language, and further comprising: determining, based on the evaluating, whether to render the source code unit in a development application as a suggested translation of the corresponding source code unit.
8 . The method of claim 1 , wherein the source code unit is one of multiple candidate automated translations of the corresponding source code unit and wherein determining, based on the evaluating, whether to render the source code unit in the development application as the suggested translation comprises: determining, based on the evaluating, whether to render the source code unit, or an alternate one of the automated translations, as a suggested translation of the corresponding source code unit.
9 . The method of claim 1 , wherein the code embedding-to-test embedding machine learning model is a transformer neural network model or an encoder-decoder neural network model.
10 . A method implemented by one or more processors, the method comprising: identifying base source code that is programmed in a base programming language; identifying target source code that is in a target programming language and that is generated based on automatically translating the base source code; processing the base source code, using a code-to-embedding machine learning model, to generate a base source code embedding in a latent space; processing the target source code, using the code-to-embedding machine learning model, to generate a target source code embedding in the latent space; processing, using a policy network, the base source code embedding and the target source code embedding, to generate a predicted test input; executing the base source code using the predicted test input to generate a base output; executing the target source code using the predicted test input to generate a target output; and determining, based on comparing the base output and the target output, whether one or more branches of the target source code are a valid translation of the base source code.
11 . The method of claim 10 , wherein determining, based on comparing the base output and the target output, whether one or more branches of the target source code are a valid translation of the base source code, comprises: determining, based on the base output and the target output not being equal, that at least a given branch, involved in executing the target source code, is not a valid translation.
12 . The method of claim 11 , further comprising: responsive to determining that at least the given branch, involved in executing the target source code, is not a valid translation: causing the given branch to be rendered, in a development application, with a particular demarcation.
13 . The method of claim 10 , wherein the policy network is trained, through reinforcement learning, to reward predictions of test inputs that result in differing outputs when used in testing a base source code, target source code pair.
14 . The method of claim 10 , further comprising: automatically translating the base source code to generate the target source code, wherein identifying the target source code is responsive to automatically translating the base source code to generate the target source code.
15 . A system, comprising: memory storing instructions; one or more processors operable to execute the instructions to: identify a source code unit that is programmed in a programming language; process the source code unit, using a code-to-embedding machine learning model, to generate one or more code unit embeddings in a latent space; process the one or more code unit embeddings, using a code embedding-to-test embedding machine learning model, to generate one or more predicted unit test embeddings in the latent space; use the one or more predicted unit test embeddings to identify a unit test for the source code unit; and evaluate the source code unit using at least the unit test, wherein in evaluating the source code unit using at least the unit test one or more of the processors are to: execute the source code unit, using test inputs of the unit test, to generate source code unit output, and evaluate the source code unit based on the source code unit output.
16 . The system of claim 15 , wherein in using the one or more predicted unit test embeddings to identify the unit test for the source code unit one or more of the processors are to: process the one or more predicted unit test embeddings, using an embedding-to-code machine learning model, to generate the unit test for the source code unit.
17 . The system of claim 16 , wherein in processing the one or more predicted unit test embeddings, using the embedding-to-code machine learning model, to generate the unit test for the source code unit one or more of the processors are to: generate a sequence of outputs based on processing the one or more predicted unit test embeddings using the embedding-to-code machine learning model, wherein the outputs of the sequence each comprise a corresponding probability distribution; and generate the unit test based on highest probabilities in the corresponding probability distributions of the sequence of outputs.
18 . The system of claim 15 , wherein one or more of the processors are further operable to execute the instructions to: determine, based on the evaluating, to deploy the source code unit in production.
19 . The system of claim 15 , wherein one or more of the processors are further operable to execute the instructions to: cause rendering, in a development application, of one or more metrics that are based on the evaluating.

Description

BACKGROUND Computer software programming often requires developers to read and/or write source code (i.e., to program) in a specific higher-level programming language. Some non-limiting examples of higher-level programming languages include Java, C++, C, Python, Perl, etc.—each of which can have its own strengths, weaknesses, nuances, idiosyncrasies, etc. To ensure quality and/or robustness of developed source code, various software testing techniques can be utilized. For example, so-called white-box testing can be utilized, such as unit testing that tests individual component(s) of software, such as a method (function) in a class. However, white-box testing requires developers to manually design test cases, which can require extensive usage of client device resources. Moreover, developers are often time-constrained and may not develop any (or insufficient) test cases. This can jeopardize the robustness and/or quality of the developed source code, which can, when it is deployed, lead to errors, unsafe operating conditions, and/or other drawbacks. As another example, so-called black-box testing can additionally and/or alternatively be utilized, such as functional testing that feeds software component(s) various input(s) and examining the resulting output(s). The resulting output(s) can be evaluated with respect to functional requirements. However, black-box testing requires developers to manually design testing parameters, which can require extensive usage of client device resources. Moreover, developers are often time-constrained and may not develop any (or insufficient) black-box tests. This can jeopardize the robustness and/or quality of the developed source code, which can, when it is deployed, lead to errors, unsafe operating conditions, and/or other drawbacks. Further, the input(s) explored in black-box testing may not include those input(s) that would indeed lead to error(s) and/or such input(s) may only be explored through computationally expensive brute-force approaches that also require exploring a large quantity of input(s) that do not lead to error(s). Further, these and/or other drawbacks are exacerbated in situations in which an entity (e.g., a company) wants to translate source code from a base higher-level programming language (e.g., C++) to a different target programming language. For example, it may be the case that unit tests are unavailable for the target programming language and existing programmers at the entity may lack expertise in the target programming language and be unable to manually generate unit tests and/or can be highly inefficient in doing so. The inefficiencies can lead to excess usage of client device resources utilized in generating unit tests. As another example, it may be the case that any functional tests are insufficient to explore appropriate edge input(s) and/or to do so in a computationally efficient manner. SUMMARY Implementations disclosed herein relate to training and/or utilization of machine learning model(s) (e.g., neural network model(s)) in automatically generating test case(s) for source code. Techniques disclosed herein can be utilized in generating test case(s) for unit test testing (or other white-box texting) and/or for functional testing (or other black-box testing). In some implementations, the source code for which the test cases(s) are generated can include (or be restricted to) translated source code that is in a “target” programming language and that was translated from a “base” programming language. For example, the source code can be target source code that was automatically translated from the base programming language to the target programming language, without requiring any human intervention. The automatic translation can be performed utilizing one or more machine learning models and/or rules-based approaches. In implementations where the source code includes translated source code, disclosed techniques can be utilized in validating the translated source code to ensure that it is robust and/or accurate prior to being deployed for actual implementations. The preceding is provided as an overview of only some implementations disclosed herein. These and other implementations are described in additional detail in the detailed description. In some implementations, a method implemented by one or more processors is provided that includes identifying a plurality of ground truth source code, unit test pairs each including a corresponding ground truth source code unit, and a corresponding ground truth unit test for the ground truth source code unit. The method further includes, for each of the ground truth source code, unit test pairs, processing the corresponding ground truth source code unit, using a code-to-embedding machine learning model, to generate one or more corresponding code unit embeddings, processing the corresponding ground truth unit test, using the code-to-embedding machine learning model, to generate one or more corresponding unit tes