Search

US-12619425-B2 - System and method that assists with identifying unpredicted portions of source code files for software engineering tasks

US12619425B2US 12619425 B2US12619425 B2US 12619425B2US-12619425-B2

Abstract

A system stores a source code file's changes from a software developer's code editor, for a software engineering task. Upon receiving the code editor's request to predict source code for the source code file, the system retrieves the software engineering task's context data, and transforms the context data to be compatible with the data format used to train a machine-learning model to assist with performing software engineering tasks. The machine-learning model uses the transformed context data to predict the source code for the source code file, with source code file portions corresponding to predicted source code portions. The system identifies each portion of the source code file which is differing from a corresponding portion of the predicted source code, via the code editor. The system commits any differing portions of the predicted source code, which are requested and accepted by the code editor, to the source code file.

Inventors

  • Mark Gabel
  • Daniel Lord

Assignees

  • Laredo Labs, Inc.

Dates

Publication Date
20260505
Application Date
20231130

Claims (20)

  1. 1 . A system for predicting source code for software engineering tasks, the system comprising: one or more processors; and a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to: store source code changes in a source code file associated with a software engineering task; retrieve context data that establishes a context for the software engineering task in response to a request to predict source code for the software engineering task; transform the context data to be compatible with a data format used to train a machine-learning model; predict, by the machine-learning model using the transformed context data, source code for the software engineering task; identify each portion of the source code changes that is determined to differ from a corresponding portion of the predicted source code; cause each identified portion to be displayed as selectable on a graphical user interface; cause, in response to a selection of one of the displayed each identified portion, the corresponding portion of the predicted source code to be displayed on the graphical user interface; and commit to the source code file, the displayed corresponding portion of the predicted source code in response to being selected as a replacement for the selected one of the displayed each identified portion of the source code changes.
  2. 2 . The system of claim 1 , wherein the plurality of instructions further causes the one or more processors to output the source code file associated with the software engineering task to the graphical user interface in response to receiving a request to begin making the source code changes to the source code file.
  3. 3 . The system of claim 1 , wherein the plurality of instructions further causes the one or more processors to display an issue report on the graphical user interface, the issue report being configured to receive at least one of: a) a clarification of the issue report that describes the software engineering task, or b) an update to the issue report to describe a strategy for completing the software engineering task.
  4. 4 . The system of claim 3 , wherein the plurality of instructions further causes the one or more processors to predict, by the machine-learning model, other source code for the source code file based on a modification to at least one of any committed portions of the predicted source code or the issue report.
  5. 5 . The system of claim 4 , wherein the plurality of instructions further causes the one or more processors to display the other source code for the source code file on the graphical user interface.
  6. 6 . The system of claim 1 , wherein the predicted source code comprises at least one of: a) source code to be added to the source code file, b) source code to be removed from the source code file, c) source code to be replaced in the source code file, or d) any type of the source code for another source code file, and each portion of the source code file comprises at least one of a line of source code, a word of source code, or a single text character of source code.
  7. 7 . The system of claim 1 , wherein each portion of the source code file that is determined to differ from a corresponding portion of the predicted source code is identified based on a determination of whether the corresponding portion of the predicted source code has a prediction confidence level that satisfies a threshold.
  8. 8 . A computer-implemented method for predicting source code for software engineering tasks, the method comprising: storing source code changes in a source code file associated with a software engineering task; retrieving context data that establishes a context for the software engineering task in response to a request to predict source code for the software engineering task; transforming the context data to be compatible with a data format used to train a machine-learning model; predicting, by the machine-learning model using the transformed context data, source code for the software engineering task; identifying each portion of the source code changes that is determined to differ from a corresponding portion of the predicted source code; causing each identified portion to be displayed as selectable on a graphical user interface; causing, in response to a selection of one of the displayed each identified portion, the corresponding portion of the predicted source code to be displayed on the graphical user interface; and committing to the source code file, the displayed corresponding portion of the predicted source code in response to being selected as a replacement for the selected one of the displayed each identified portion of the source code changes.
  9. 9 . The method of claim 8 , wherein the computer-implemented method further comprises outputting the source code file associated with the software engineering task to the graphical user interface in response to receiving a request to begin making the source code changes to the source code file.
  10. 10 . The method of claim 8 , wherein the computer-implemented method further comprises displaying an issue report on the graphical user interface, the issue report being configured to receive at least one of: a) a clarification of the issue report that describes the software engineering task or b) an update to the issue report to describe a strategy for completing the software engineering task.
  11. 11 . The method of claim 10 , wherein the computer-implemented method further comprises predicting, by the machine-learning model, other source code for the source code file based on a modification to at least one of any committed portions of the predicted source code or the issue report.
  12. 12 . The method of claim 11 , wherein the computer-implemented method further comprises displaying the other source code for the source code file on the graphical user interface.
  13. 13 . The method of claim 8 , wherein the predicted source code comprises at least one of: a) source code to be added to the source code file, b) source code to be removed from the source code file, c) source code to be replaced in the source code file, or d) any type of the source code for another source code file, and each portion of the source code file comprises at least one of a line of source code, a word of source code, or a single text character of source code.
  14. 14 . The method of claim 8 , wherein each portion of the source code file that is determined to differ from a corresponding portion of the predicted source code is identified based on a determination of whether the corresponding portion of the predicted source code has a prediction confidence level that satisfies a threshold.
  15. 15 . A computer program product, comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein to be executed by one or more processors, the program code including instructions to: store source code changes in a source code file associated with a software engineering task; retrieve context data that establishes a context for the software engineering task in response to a request to predict source code for the software engineering task; transform the context data to be compatible with a data format used to train a machine-learning model; predict, by the machine-learning model using the transformed context data, the source code for the software engineering task; identify each portion of the source code changes that is determined to differ from a corresponding portion of the predicted source code; cause each identified portion to be displayed as selectable on a graphical user interface; cause, in response to a selection of one of the displayed each identified portion, the corresponding portion of the predicted source code to be displayed on the graphical user interface; and commit to the source code file, the displayed corresponding portion of the predicted source code in response to being selected as a replacement for the selected one of the displayed each identified portion of the source code changes.
  16. 16 . The computer program product of claim 15 , wherein the program code includes further instructions to output the source code file associated with the software engineering task to the graphical user interface in response to receiving a request to begin making the source code changes to the source code file.
  17. 17 . The computer program product of claim 15 , wherein the program code includes further instructions to display an issue report on the graphical user interface, the issue report being configured to receive at least one of: a) a clarification of the issue report that describes the software engineering task or b) an update to the issue report to describe a strategy for completing the software engineering task.
  18. 18 . The computer program product of claim 17 , wherein the program code includes further instructions to predict, by the machine-learning model, other source code for the source code file based on a modification to at least one of any committed portions of the predicted source code or the issue report; and display the other source code for the source code file on the graphical user interface.
  19. 19 . The computer program product of claim 15 , wherein the predicted source code comprises at least one of: a) source code to be added to the source code file, b) source code to be removed from the source code file, c) source code to be replaced in the source code file, or d) any type of the source code for another source code file, and each portion of the source code file comprises at least one of a line of source code, a word of source code, or a single text character of source code.
  20. 20 . The computer program product of claim 15 , wherein each portion of the source code file that is determined to differ from a corresponding portion of the predicted source code is identified based on a determination of whether the corresponding portion of the predicted source code has a prediction confidence level that satisfies a threshold.

Description

CLAIM OF PRIORITY This application claims the benefit of U.S. Provisional Patent Application 63/429,655 entitled, SYSTEM AND AGENT FOR PERFORMING AND ASSISTING WITH GENERALIZED SOFTWARE ENGINEERING TASKS by Gabel, et al., filed Dec. 2, 2022, the entire contents of which is incorporated herein by reference. BACKGROUND Software engineering is the process of constructing functional software from requirements and/or specifications, which are human-centered, commonly taking the form of natural language documents and supporting data. The constructed software is machine-centered, taking the form of textual source code and supporting data. Software developers often use software tools to aid with these complex tasks of creating requirements and specifications documents, and constructing source code. Requirements and specifications documents need to be interpreted and refined in a process that involves understanding ideas at various levels of abstractions, both vague and precise, and organizing these ideas into coherent software designs. Software engineers need to understand technical natural language documents and reconcile their implied needs with the capabilities of the source code's underlying computing platforms. Software tools for requirements and specifications management and design/architecture are a broad category of products, which include process-focused management systems that facilitate communication and cataloging of requirements and specifications, modeling tools that allow visualization of potential software designs with various degrees of fidelity, and project management systems that are often used to store requirements and specifications, and track their progress toward implementation. These early-stage software tools are alike in that they facilitate relatively narrow tasks and do not even attempt to fully automate these tasks. Software tools for software construction form an even larger category of products. The majority of these tools are implemented in code editors, which provide assistance for many of the following tasks. Code editors make source code easier to read by organizing and highlighting the source code, and facilitate source code navigation through hyperlinks. Source code can include readme files and other types of text files, such as boilerplate license headers, which are typically associated with source code files. These software construction tools edit code to make its syntactic style consistent, and add dependent source code constructs such as import statements. Code editors generate boilerplate source code from fixed templates, and reducing typing by completing partially-typed words or lines of source code. Software construction tools streamline the process of writing source code, but do not automate the writing of source code, nor assist a software developer to write source code that is more suitably connected to requirements or specifications. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1A and FIG. 1B illustrate examples of lines of source code that may be used to assist with performing software engineering tasks, under an embodiment; FIG. 2 illustrates a block diagram of an example system that assists with performing software engineering tasks, under an embodiment; FIG. 3 depict a flowchart that illustrates a method that trains a machine-learning model to assist with performing software engineering tasks, under an embodiment; FIG. 4A and FIG. 4B depict a flowchart that illustrates a method that uses a machine-learning model to assist with performing software engineering tasks, under an embodiment; FIG. 5A and FIG. 5B depict a flowchart that illustrates a method that assists with writing issue reports that describe software engineering tasks and issue information, under an embodiment; FIG. 6A and FIG. 6B depict a flowchart that illustrates a method that assists with automating software engineering tasks, under an embodiment; FIG. 7A and FIG. 7B depict a flowchart that illustrates a method that assists with writing source code for software engineering tasks, under an embodiment; FIG. 8A and FIG. 8B depict a flowchart that illustrates a method that assists with identifying unpredicted portions of source code files for software engineering tasks, under an embodiment; and FIG. 9 is a block diagram illustrating an example hardware device in which the subject matter may be implemented. DETAILED DESCRIPTION FIG. 1A and FIG. 1B illustrate example lines of source code that may be used to assist with performing software engineering tasks, under an embodiment. Lines of source code 102 illustrate example source code for performing a standard Artificial Intelligence (AI)-driven code completion that interprets a comment to generate a full line of source code, as depicted by FIG. 1A. Lines of source code 104 illustrate example source code for performing a more elaborate AI-driven code completion that generates an entire function body from a function name and a line of documentation contained in a