CN-121979503-A - Code generation method, device, computer equipment and storage medium
Abstract
The invention relates to the technical field of artificial intelligence and discloses a code generation method, a code generation device, computer equipment and a storage medium in the fields of financial science and technology and medical health. The method comprises the steps of carrying out semantic analysis on a code generation request to obtain a request semantic tag and a request keyword, carrying out text embedding processing on the code generation request to obtain a request conversion vector, searching at least one candidate code node matched with the request semantic tag from a preset code knowledge graph, screening all candidate code nodes according to the request conversion vector to obtain reference code nodes, carrying out node expansion processing on all the reference code nodes according to the request keyword to obtain context nodes, determining target prompt information according to the code generation request, the reference code nodes and the context nodes, and inputting the target prompt information into a preset large language model to obtain the target code. The invention improves the semantic alignment degree and the code quality and ensures the usability of the generated code.
Inventors
- LI JIANQIANG
- ZUO LONGLONG
- CHU QIUSHI
Assignees
- 平安科技(深圳)有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260120
Claims (10)
- 1. A code generation method, comprising: Carrying out semantic analysis on a code generation request to obtain a request semantic tag and a request keyword, and carrying out text embedding processing on the code generation request to obtain a request conversion vector; Searching at least one candidate code node matched with the request semantic tag from a preset code knowledge graph; screening all the candidate code nodes according to the request conversion vector to obtain reference code nodes; performing node expansion processing on all the reference code nodes according to the request keywords to obtain context nodes; determining target prompt information according to the code generation request, the reference code node and the context node; and inputting the target prompt information into a preset large language model to obtain a target code corresponding to the code generation request.
- 2. The code generation method according to claim 1, wherein before finding at least one candidate code node matching the request semantic tag from a preset code knowledge graph, the method comprises: Acquiring a plurality of groups of history source codes, carrying out semantic analysis on each history source code, and generating semantic tag notes of each history source code; performing dependency relation analysis on all the historical source codes to generate dependency description notes of the historical source codes; extracting code entities and structural relations of the historical source codes for each historical source code; And generating a preset code knowledge graph according to all the code entities, the structural relationships, the semantic tag notes and the dependency description notes of the historical source codes, wherein the preset code knowledge graph is established by creating nodes based on the code entities, creating edges based on the structural relationships and generating node attribute vectors based on the semantic tag notes and the dependency description notes.
- 3. The code generation method of claim 2, wherein said semantically analyzing each of said historical source codes to generate semantic tag annotations for each of said historical source codes comprises: analyzing the history source code and identifying an application program interface code; carrying out semantic analysis on the application program interface codes through a preset prompt word template and a pre-training semantic analysis model to obtain service field labels and technical architecture labels; and generating semantic tag notes of the history source codes according to the service field tags and the technical architecture tags.
- 4. The code generation method of claim 2, wherein said performing a dependency analysis on all of said historical source codes to generate dependency description annotations for each of said historical source codes comprises: Analyzing all the historical source codes based on the abstract syntax tree to obtain a call directed graph; sequencing the degree of incidence values of the directed graph nodes in the call directed graph, and determining a first preset number of directed graph nodes with the highest degree of incidence values as core dependent nodes; And converting the historical source codes corresponding to the core dependent nodes through a preset dependency analysis model to obtain a dependency function description and a dependency example description, and generating a dependency description annotation of the historical source codes according to the dependency function description and the dependency example description.
- 5. The code generation method of claim 1, wherein the searching at least one candidate code node matching the request semantic tag from a preset code knowledge graph comprises: And comparing the request semantic label with semantic label notes of all code entity nodes in the preset code knowledge graph one by one, and determining the code entity nodes with the semantic label notes consistent with the request semantic label as candidate code nodes.
- 6. The code generation method of claim 1, wherein the filtering all the candidate code nodes according to the request translation vector to obtain reference code nodes comprises: Performing similarity analysis on the node attribute vector of each candidate code node and the request conversion vector to obtain candidate similarity values of the candidate code nodes; And sorting all the candidate code nodes according to the candidate similarity values, and selecting a second preset number of candidate code nodes with the highest candidate similarity values to be determined as reference code nodes.
- 7. The code generating method according to claim 1, wherein the node expansion processing is performed on all the reference code nodes according to the request keyword to obtain context nodes, and the method comprises: For each reference code node, traversing along a calling relation side and an inheritance relation side by taking the reference code node as a center in the preset code knowledge graph to obtain a traversing extension subgraph; And screening the code entity nodes in the traversal extension subgraph according to the request keywords to obtain context nodes.
- 8. A code generating apparatus, comprising: The request analysis module is used for carrying out semantic analysis on the code generation request to obtain a request semantic tag and a request keyword, and carrying out text embedding processing on the code generation request to obtain a request conversion vector; the candidate node determining module is used for searching at least one candidate code node matched with the request semantic tag from a preset code knowledge graph; The reference node determining module is used for screening all the candidate code nodes according to the request conversion vector to obtain reference code nodes; The expansion processing module is used for carrying out node expansion processing on all the reference code nodes according to the request keywords to obtain context nodes; the prompt information determining module is used for determining target prompt information according to the code generation request, the reference code node and the context node; and the target code generation module is used for inputting the target prompt information into a preset large language model to obtain a target code corresponding to the code generation request.
- 9. A computer device comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor, when executing the computer readable instructions, implements the code generation method of any of claims 1 to 7.
- 10. A computer-readable storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the code generation method of any of claims 1 to 7.
Description
Code generation method, device, computer equipment and storage medium Technical Field The invention relates to the technical field of artificial intelligence, in particular to a code generation method, a code generation device, computer equipment and a storage medium in the fields of financial science and technology and medical health. Background With the development of deep learning, a code generation technology based on a large language model (Large Language Model, abbreviated as LLM) has been widely applied to software development processes in different fields. In the financial insurance industry, large language models are utilized to analyze insurance clauses or wind control rules of natural language descriptions, corresponding code logic is automatically generated, and manual coding errors are reduced. In the medical health industry, a large language model is utilized to analyze medical terms or clinical guidelines, a reference code comprising data loading, model definition and training cycles is generated, and the development process of a clinical decision support system is accelerated. However, existing code generation techniques rely solely on generic LLM, and suffer from drawbacks in facing complex business systems (e.g., banking core transaction systems, hospital information systems). Generic LLM lacks business semantic understanding of specific items, affecting the functionality of the generated code. For example, "User" refers to an average User in the general context, while "KYC authenticated transaction agent" may be specified in a particular financial system. The code generated by the general LLM often compiles an inexistent method signature or parameter list, so that the generated code cannot be compiled or has potential safety hazards, and the usability of the generated code is reduced. The general LLM adopts simple vector retrieval, so that a large number of similar but irrelevant noise code fragments are easily retrieved, and necessary contexts are omitted, so that the structural consistency of generated codes is affected. Disclosure of Invention In view of the foregoing, it is desirable to provide a code generation method, apparatus, computer device, and storage medium, to solve the problems of semantic understanding deviation and poor code usability of code generated based on generic LLM. A code generation method, comprising: Carrying out semantic analysis on a code generation request to obtain a request semantic tag and a request keyword, and carrying out text embedding processing on the code generation request to obtain a request conversion vector; Searching at least one candidate code node matched with the request semantic tag from a preset code knowledge graph; screening all the candidate code nodes according to the request conversion vector to obtain reference code nodes; performing node expansion processing on all the reference code nodes according to the request keywords to obtain context nodes; determining target prompt information according to the code generation request, the reference code node and the context node; and inputting the target prompt information into a preset large language model to obtain a target code corresponding to the code generation request. A code generating apparatus comprising: The request analysis module is used for carrying out semantic analysis on the code generation request to obtain a request semantic tag and a request keyword, and carrying out text embedding processing on the code generation request to obtain a request conversion vector; the candidate node determining module is used for searching at least one candidate code node matched with the request semantic tag from a preset code knowledge graph; The reference node determining module is used for screening all the candidate code nodes according to the request conversion vector to obtain reference code nodes; The expansion processing module is used for carrying out node expansion processing on all the reference code nodes according to the request keywords to obtain context nodes; the prompt information determining module is used for determining target prompt information according to the code generation request, the reference code node and the context node; and the target code generation module is used for inputting the target prompt information into a preset large language model to obtain a target code corresponding to the code generation request. A computer device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, the processor implementing the code generation method described above when executing the computer readable instructions. A computer-readable storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform a code generation method as described above. The code generation method comprises the steps of carrying out semantic analysis on a