EP-4508561-B1 - EDITING FILES USING A PATTERN-COMPLETION ENGINE

EP4508561B1EP 4508561 B1EP4508561 B1EP 4508561B1EP-4508561-B1

Inventors

COSGROVE, Christian Alexander
TIWARY, SAURABH KUMAR

Dates

Publication Date: 20260506
Application Date: 20230201

Claims (13)

A computer-implemented method for editing selected file content, comprising: receiving (1304) selected file content (204; 304); receiving (1306) an input message that describes an objective of the user in editing the selected file content; producing (1308) current context information that includes text tokens that make up at least the selected file content and the input message; requesting (1310) a pattern-completion engine to generate edit information based on the current context information, the edit information describing one or more changes to the selected file content that satisfy the objective of the user, the pattern-completion engine using a machine-trained autoregressive text-completion model, the autoregressive text-completion model having been trained by a training process based on information extracted from a repository that describes revisions made to plural files, and based on training examples that are selected to satisfy one or more tests, each particular revision in the repository including a revision message that describes an objective of the particular revision, and file change information that implements the particular revision; and receiving (1312) the edit information from the pattern-completion engine, and presenting the edit information to the user, said one or more tests being used to increase a likelihood that the edit information, when executed on a program execution platform, satisfies at least one specified computer performance metrics, wherein the method uses a machine-trained model to predict the at least one computer performance metric, and wherein the prediction is based on training examples that identify a measured performance of a plurality of code edits.
The computer-implemented method of claim 1, wherein the selected file content is a portion of a computer program, and wherein the method further comprises executing the computer program on the program execution program.
The computer-implemented method of claim 1, wherein the selected file content is a portion of file content that is presented by a user interface presentation of a user computing device.
The computer-implemented method of claim 1, wherein the pattern-completion engine is implemented as a transformer-based decoder.
The computer-implemented method of claim 1, further including: receiving a revised message that represents a revision of the input message; updating the current context information to include the revised message, to provide updated current context information; and requesting the pattern-completion engine to generate another instance of edit information based on the updated current context information.
The computer-implemented method of claim 1, wherein a part of the training process includes generating a code-language model based on a repository of natural language training examples and a repository of computer program training examples, and wherein another part of the training process includes fine-tuning the code-language model based on the repository that describes the revisions.
The computer-implemented method of claim 1, wherein the training process includes: generating a repository of selected training examples that satisfy one or more computer performance metrics; and performing fine-tuning of the autoregressive text-completion model based on the repository of selected training examples.
The computer-implemented method of claim 1, wherein the training process includes: generating a repository of selected training examples that satisfy one or more unit tests (1110), said one or more unit tests ensuring that the selected training examples produce predetermined expected results; and performing fine-tuning of the autoregressive text-completion model based on the repository of selected training examples.
The computer-implemented method of claim 8, wherein the unit tests are automatically generated by a machine-trained model, wherein said machine trained model performs training on a set of code edits and associated unit tests generated by human developers.
The computer-implemented method of claim 1, wherein the training process includes: generating a repository of selected training examples that are chosen based on judgments of at least one human evaluator, said at least one human evaluator making each judgment by choosing among two instances of edit information; and performing fine-tuning of the autoregressive text-completion model based on the repository of selected training examples.
A computer-readable storage medium for storing computer-readable instructions that, when executed by one or more hardware processors, perform the method according to any of claims 1-10.
A computing system for editing selected file content, comprising: hardware logic circuitry configured to execute instructions provided in memory to perform operations including: receiving (1304) selected file content (204; 304); receiving (1306) an input message that describes an objective of the user in editing the selected file content; producing (1308) current context information that includes text tokens that make up at least the selected file content and the input message; requesting (1310) a pattern-completion engine to generate edit information based on the current context information, the edit information describing one or more changes to the selected file content that satisfy the objective of the user, the pattern-completion engine using a machine-trained autoregressive text-completion model, the autoregressive text-completion model having been trained by a training process based on information extracted from a repository that describes revisions made to plural files, and based on training examples that are selected to satisfy one or more tests, each particular revision in the repository including a revision message that describes an objective of the particular revision, and file change information that implements the particular revision; and receiving (1312) the edit information from the pattern-completion engine, and presenting the edit information to the user, said one or more tests being used to increase a likelihood that the edit information, when executed on a program execution platform, satisfies at least one specified computer performance metrics, wherein the method uses a machine-trained model to predict the at least one computer performance metric, and wherein the prediction is based on training examples that identify a measured performance of a plurality of code edits.
The computing system of claim 12, wherein the autoregressive text-completion model is trained by a training process, wherein part of the training process includes generating a code-language model based on a repository of natural language training examples and a repository of computer program training examples, wherein another part of the training process involves fine-tuning the code-language model based on a repository that describes revisions made to plural files, to produce an original edit model, each particular revision in the repository including a revision message that describes an objective of the particular revision, and file change information that implements the particular revision, and wherein another part of the training process involves refining the original edit model based on at least one repository of selected training examples that are determined to satisfy one or more tests.

Description

BACKGROUND Program developers currently write computer programs using code-editing systems that provide various code-writing tools. While these tools are helpful, the process of developing a computer program often remains time-consuming, labor intensive, and prone to error. AHMED ELGOHARY ET AL: "NL-EDIT: Correcting semantic parse errors through natural language interaction", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 26 March 2021 (2021-03-26), relates to NL-EDIT: correcting semantic parse errors through natural language interaction. In the task of text-to-SQL parsing, the objective is given a database schema (tables, columns, and primary-foreign key relations) and a natural language question, generate a SQL query that answers the question when executed against the database. The difference between the erroneous parse and the correct one can mostly be described as a few edits that need to be applied to the initial parse to correct its errors. BELIZ GUNEL ET AL: "Supervised Contrastive Learning for Pre-trained Language Model Finetuning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 3 November 2020 (2020-11-03), relates to supervised contrastive learning for pre-trained language model fine-tuning. Natural language understanding classification models follow two stages: pre-training a large language model on an auxiliary task, and then finetuning the model on a task-specific labeled dataset using cross-entropy loss. A supervised contrastive learning (SCL) objective is also used for the fine-tuning stage. SINONG WANG ET AL: "Entailment as Few-Shot Learner", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 29 April 2021 (2021-04-29), relates to entailment as few-shot learner. Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners. The potential NLP task is reformulated into an entailment task, and then fine-tune the model with as little as 8 examples. SUMMARY It is therefore the object of the present invention to improve correctness of code with improved performance. This object is solved by the subject matter of the independent claims. Preferred embodiments are defined by the dependent claims. A computer-implemented technique is described herein for assisting a user in editing a file, such as a file that provides a computer program. In some implementations, the technique produces current context information that includes an input message and selected file content. The input message describes a user's editing objective, while the selected file content describes a portion of a file to which the editing objective is to be applied. The technique then requests a pattern-completion engine to generate edit information based on the current context information. The edit information describes one or more changes to the selected file content that satisfy the editing objective of the user. In some implementations, the pattern-completion engine uses a machine-trained autoregressive text-completion model. The autoregressive text-completion model is trained in a training process based on revision history information. The revision history information describes revisions made to plural files. That is, each revision in the revision history information includes a revision message that describes an editing objective of the revision, and file change information that implements the particular revision. In some implementations, the training process includes plural stages or parts. In a first part, the training process generates a code-language model based on a repository of natural language training examples and a repository of computer program training examples. In a second part, the training process fine-tunes the code-language model based on the above-described revision history information, to produce an edit model. In a third part, the training process further refines the edit model based on at least one repository of selected training examples that are determined to satisfy specified criteria. The third part increases the likelihood that the edit information generated by the pattern-completion engine will provide code that correctly performs its functions, and code that meets specified computer performance metrics. The technique satisfies various technical objectives. For instance, the technique reduces the amount of time and labor that is required to create file content. The technique also facilitates the generation of program content that is free from errors and that satisfies various computer performance metrics. The above-summarized technology can be manifested in various types of systems, devices, components, methods, computer-readable storage media, data structures, graphical user interface presentations, articles of manufacture, and so on. This Summary is provided to introduce a selection of concepts in a simplified form; these concepts are