CN-121996291-A - Code recognition method, code recognition model training method and device
Abstract
The application provides a code recognition method, a code recognition model training method and a code recognition model training device. The code identification method comprises the steps of obtaining a first code and a second code, wherein the length of the second code is smaller than or equal to that of the first code, converting the first code into a first code attribute diagram, converting the second code into a second code attribute diagram, inputting the first code attribute diagram and the second code attribute diagram into a code identification model based on a graph neural network, enabling the code identification model to output an identification result, wherein the identification result comprises a diagram relation between the second code attribute diagram and the first code attribute diagram, and confirming that the first code contains the second code when the diagram relation represents that the second code attribute diagram is a subgraph of the first code attribute diagram. The code identification method can efficiently identify operator codes in the applied program codes, and reduces the time cost of code identification.
Inventors
- ZHANG WENRUI
- YAN BAICHENG
- YUAN TING
- ZHANG GE
- WANG JIE
Assignees
- 华为技术有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241105
Claims (14)
- 1. A method of code identification, the method comprising: Acquiring a first code and a second code, wherein the length of the second code is smaller than or equal to that of the first code; converting the first code into a first code attribute map and converting the second code into a second code attribute map; Inputting the first code attribute graph and the second code attribute graph into a code recognition model based on a graph neural network, so that the code recognition model outputs a recognition result, wherein the recognition result comprises a graph relationship between the second code attribute graph and the first code attribute graph; When the graph relationship indicates that the second code attribute graph is a sub-graph of the first code attribute graph, confirming that the first code contains the second code.
- 2. The method of claim 1, wherein the first code attribute map comprises a first node to which a first code element in the first code is translated, and the second code attribute map comprises a second node to which a second code element in the second code is translated, the recognition result further comprising a first node relationship between the first node and the second node; the method further includes recording a correspondence of a position of the first code element in the first code and a position of the second code element in the second code when the first node relationship indicates that the first node and the second node match.
- 3. The method of claim 2, wherein the first code attribute map further comprises a third node to which a third code element in the first code is translated, the recognition result further comprising a second node relationship between the first node and the third node; The recording a correspondence between the position of the first code element in the first code and the position of the second code element in the second code when the first node relationship indicates that the first node and the second node match, includes: Identifying a third node relationship between a neighbor node of the second node and a neighbor node of the first node, and identifying a fourth node relationship between a neighbor node of the second node and a neighbor node of the third node, when the first node relationship indicates that the first node and the second node match, and the second node relationship indicates that the third node and the second node match; And when the third node relation indicates that the neighbor node of the second node is matched with the neighbor node of the first node, and the fourth node relation indicates that the neighbor node of the second node is not matched with the neighbor node of the third node, recording the corresponding relation between the position of the first code element in the first code and the position of the second code element in the second code.
- 4. The method of claim 2, wherein the first code attribute map further comprises a third node to which third code elements in the first code are converted, the recognition result further comprising a second node relationship between the first node and the third node; The recording a correspondence between the position of the first code element in the first code and the position of the second code element in the second code when the first node relationship indicates that the first node and the second node match, includes: When the first node relation indicates that the first node and the second node are matched and the second node relation indicates that the third node and the second node are matched, inputting the first node, the second node and the third node into a node recognition model so that the node recognition model inputs a node recognition result; And when the node identification result indicates that the first node and the second node are matched and the third node and the second node are not matched, recording the corresponding relation between the position of the first code element in the first code and the position of the second code element in the second code.
- 5. The method of claim 1, wherein said inputting the first code attribute map and the second code attribute map to a code recognition model comprises: Splitting the first code attribute map into a plurality of neighborhood maps and splitting the second code attribute map into at least two neighborhood maps; inputting the plurality of neighborhood graphs and the at least two neighborhood graphs to the code recognition model, so that the code recognition model outputs the recognition result; wherein when each of the at least two neighborhood graphs matches at least one neighborhood graph in the plurality of neighborhood graphs, the graph relationship indicates that the second code attribute graph is a sub-graph of the first code attribute graph.
- 6. The method of claim 5, wherein the plurality of neighborhood maps comprises a first neighborhood map and the at least two neighborhood maps comprise a second neighborhood map; the method further includes confirming that nodes in the first neighborhood graph and nodes in the second neighborhood graph match when the first neighborhood graph and the second neighborhood graph match.
- 7. A method for training a code recognition model, the method comprising: The method comprises the steps of obtaining a training graph pair, wherein the training graph pair comprises a third code attribute graph and a fourth code attribute graph, and the training graph pair is provided with a first label which is used for representing that the fourth code attribute graph is a subgraph of the third code attribute graph; Inputting the training graph pair into a code recognition model based on a graph neural network, so that the code recognition model outputs a recognition result of the training graph pair, wherein the recognition result comprises a graph relationship between the fourth code attribute graph and the third code attribute graph; When the graph relationship between the fourth code attribute graph and the third code attribute graph in the recognition result is inconsistent with the first label, updating parameters of the code recognition model so that the graph relationship between the fourth code attribute graph and the third code attribute graph in the recognition result of the training graph pair output by the code recognition model again is consistent with the first label.
- 8. The method of claim 7, wherein the third code attribute map comprises a fourth node, the fourth code attribute map comprises a fifth node, the training map pair has a second label, the second label is used to represent that the fourth node matches the fifth node, and the recognition result further comprises a node relationship between the fourth node and the fifth node; the method further comprises the step of updating parameters of the code recognition model when the node relation between the fourth node and the fifth node in the recognition result is inconsistent with the second label, so that the node relation between the fourth node and the fifth node in the recognition result of the training graph pair output by the code recognition model again is consistent with the second label.
- 9. The method according to claim 7 or 8, wherein the inputting the training pattern pair into a code recognition model such that the code recognition model outputs recognition results of the training pattern pair comprises: Splitting the third code attribute map into a plurality of neighborhood maps, and splitting the fourth code attribute map into at least two neighborhood maps; inputting the plurality of neighborhood graphs and the at least two neighborhood graphs to the code recognition model, so that the code recognition model outputs the recognition result; Wherein when each of the at least two neighborhood graphs matches at least one neighborhood graph in the plurality of neighborhood graphs, the graph relationship indicates that the fourth code attribute graph is a sub-graph of the third code attribute graph.
- 10. A code identification device, the device comprising: The system comprises an acquisition module, a first code acquisition module and a second code acquisition module, wherein the length of the second code is smaller than or equal to that of the first code; a conversion module for converting the first code into a first code attribute map and converting the second code into a second code attribute map; the identification module is used for inputting the first code attribute graph and the second code attribute graph into a code identification model based on a graph neural network so that the code identification model outputs an identification result, wherein the identification result comprises a graph relationship between the second code attribute graph and the first code attribute graph; and the confirming module is used for confirming that the first code contains the second code when the graph relation shows that the second code attribute graph is a sub-graph of the first code attribute graph.
- 11. A code recognition model training apparatus, the apparatus comprising: the system comprises an acquisition module, a training graph pair and a code attribute graph, wherein the training graph pair comprises a third code attribute graph and a fourth code attribute graph, the training graph pair is provided with a first label, and the first label is used for representing that the fourth code attribute graph is a sub-graph of the third code attribute graph; The input module is used for inputting the training graph pair into a code recognition model based on a graph neural network, so that the code recognition model outputs a recognition result of the training graph pair, wherein the recognition result comprises a graph relationship between the fourth code attribute graph and the third code attribute graph; And the training module is used for updating parameters of the code recognition model when the graph relationship between the fourth code attribute graph and the third code attribute graph in the recognition result is inconsistent with the first label, so that the graph relationship between the fourth code attribute graph and the third code attribute graph in the recognition result of the training graph pair output by the code recognition model again is consistent with the first label.
- 12. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any one of claims 1-6 or the method of any one of claims 7-9.
- 13. A computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform the method of any of claims 1-6 or the method of any of claims 7-9.
- 14. A computer program product comprising instructions which, when executed by a cluster of computer devices, cause the cluster of computer devices to perform the method of any of claims 1-6 or the method of any of claims 7-9.
Description
Code recognition method, code recognition model training method and device Technical neighborhood The present application relates to the field of computer technologies, and in particular, to a code recognition method, a code recognition model training method, and a device. Background Application migration and performance optimization are often required in the field of high performance computing (high performance computing, HPC) and the like. In application migration or performance optimization, it is necessary to re-write or adjust the code of standard operators in the program code of the application. This requires identifying the code of the standard operator in the program code of the application. In the related art, the code of the standard operator is identified in the applied program code by using a conventional sub-graph matching algorithm such as a graph isomorphism (e.g., VF 2) algorithm. Conventional subgraph matching algorithms have difficulty coping with large-scale code libraries and longer-segment codes. If the traditional sub-graph matching algorithm is implemented for large-scale codes, the time cost is high. While the code library size of the HPC neighborhood is larger and the fragments are longer. Thus, there is a need for a code recognition scheme that can reduce time costs. Disclosure of Invention The application provides a code recognition method, a code recognition model training method and a code recognition model training device, which can efficiently recognize operator codes in applied program codes and reduce the time cost of code recognition. In a first aspect, a code identification method is provided, the method comprising the steps of obtaining a first code and a second code, wherein the length of the second code is smaller than or equal to that of the first code, converting the first code into a first code attribute map and converting the second code into a second code attribute map, inputting the first code attribute map and the second code attribute map into a code identification model based on a graph neural network, enabling the code identification model to output an identification result, wherein the identification result comprises a graph relation between the second code attribute map and the first code attribute map, and confirming that the first code contains the second code when the graph relation represents that the second code attribute map is a sub-graph of the first code attribute map. The first code may be, for example, a program code of an application, and the second code may be a code of an operator, for example, a code of a standard operator. The code identification method provided by the embodiment of the application utilizes the code identification model based on the graph neural network to identify the relationship between the code attribute graphs of the two codes, thereby identifying the inclusion relationship between the two codes. That is, the code identification method provided by the embodiment of the application utilizes the graph neural network algorithm to identify the inclusion relationship between two codes. Compared with the traditional sub-graph matching algorithm (such as VF 2), the code identification method provided by the embodiment of the application can improve the identification accuracy and the identification efficiency. In one possible implementation, the first code attribute map includes a first node, the second code attribute map includes a second node, the first node is to which a first code element in the first code is converted, the second node is to which a second code element in the second code is converted, the recognition result further includes a first node relationship between the first node and the second node, and the method further includes recording a correspondence between a position of the first code element in the first code and a position of the second code element in the second code when the first node relationship indicates that the first node and the second node match. The position of the first code element in the first code may be a row where the code element in the first code is located, and the position of the second code element in the second code may be a row where the code element in the second code is located. In this implementation, the location of the second code in the first code, e.g., the row, may be known at the same time as the first code is confirmed to contain the second code. In this way, the code of the operator can be located in the program code of the application. In one possible implementation, the first code attribute map further comprises a third node, the third node is a third node element in the first code and is converted to, the identification result further comprises a second node relation between the first node and the third node, when the first node relation indicates that the first node is matched with the second node, the corresponding relation of the position of the first code elemen