CN-121561929-B - Source code high-order vulnerability detection method and system based on execution path analysis

CN121561929BCN 121561929 BCN121561929 BCN 121561929BCN-121561929-B

Abstract

The invention relates to the technical field of software security and provides a method and a system for detecting high-order loopholes of source codes based on execution path analysis, wherein the method comprises the steps of obtaining source codes to be analyzed, and constructing a function execution diagram for each function in the source codes; on a function execution graph, starting from a function entry node, generating a function execution path according to an execution sequence edge, applying a bounded constraint to the function execution path generation process, calculating priority scores for any function execution path by adopting a path priority function, sorting and screening the function execution paths based on the priority scores to obtain candidate execution paths, and obtaining a vulnerability type and vulnerability positioning by a path coding model, a statement encoder, an attention mechanism and a vulnerability detection model. And the accuracy of vulnerability detection and the positioning interpretability are improved under the controllable computing cost.

Inventors

ZHAO DAWEI
ZHAO CHENGXIAO
LI XIN
XU LIJUAN
Song Shangren
GAO BOYANG
SONG WEIZHAO
YU FUQIANG
ZHANG ZEYU

Assignees

齐鲁工业大学(山东省科学院)
山东省计算中心（国家超级计算济南中心）

Dates

Publication Date: 20260508
Application Date: 20260126

Claims (8)

1. The method for detecting the source code high-order loopholes based on the execution path analysis is characterized by comprising the following steps of: Acquiring source codes to be analyzed, and constructing a function execution graph for each function in the source codes, wherein the edges of the function execution graph are used for representing association relations in the program execution process, and the association relations comprise execution sequence relations, function call relations, variable propagation relations and writing and reading relations of state persistence media; On the function execution graph, starting from a function entry node, generating a function execution path according to an execution sequence edge, and applying a bounded constraint to the function execution path generation process; calculating priority scores for execution paths of any function by adopting a path priority function, and sorting and screening the execution paths of the function based on the priority scores to obtain candidate execution paths, wherein the path priority function is a weighted sum of path risk prompt information, path novelty and path cost; For the candidate execution path, obtaining path level characteristics through a path coding model; for each node of the function execution graph, a sentence characteristic vector is obtained through a sentence encoder, and the sentence characteristic vectors are sequentially stacked according to node indexes to obtain sentence-level characteristics; The path risk prompt message is as follows: wherein, the method comprises the steps of, Representing a path of execution of a function The node on the upper side of the node, The node type is indicated as such, As a preset weight for the node type, For the node risk hit indication quantity, Representing a set of preset risk trigger patterns, For the risk trigger pattern weight, Execution path for function Hit indication amount corresponding to the risk trigger mode; The path novelty is: Wherein P is A set of execution paths for the selected candidate, a function execution path The three kinds of information of (1) are respectively a branch condition set Calling information sets And variable access information set Function execution path The three kinds of information of (1) are respectively a branch condition set Calling information sets Variable access information set Path similarity Weight of , Is Jacquard similarity.
2. The method of claim 1, wherein nodes of the function execution graph are used to represent statements, blocks of statements, or continuous basic code segments, and associate source code location information, variable access information, and node types.
3. The method of claim 1, wherein the bounded constraint comprises a maximum path length, a maximum number of loop expansions, and a maximum call depth.
4. The method of claim 1, wherein the path cost is a weighted sum of path length, cross-function call depth, and number of loop unrolling.
5. The method of claim 1, wherein the path coding model is trained by constructing positive and negative pairs of samples based on the structure and semantic similarity between candidate execution paths.
6. A source code high-order vulnerability detection system based on execution path analysis, comprising: The diagram construction module is configured to acquire source codes to be analyzed and construct a function execution diagram for each function in the source codes, wherein the edges of the function execution diagram are used for representing association relations in the program execution process, including execution sequence relations, function call relations, variable propagation relations and writing and reading relations of state persistence media; A path generation module configured to generate a function execution path according to an execution sequence edge from a function entry node on a function execution graph, and apply a bounded constraint to the function execution path generation process; calculating priority scores for execution paths of any function by adopting a path priority function, and sorting and screening the execution paths of the function based on the priority scores to obtain candidate execution paths, wherein the path priority function is a weighted sum of path risk prompt information, path novelty and path cost; The vulnerability detection module is configured to obtain path-level features through a path coding model for candidate execution paths, obtain statement feature vectors through a statement encoder for each node of the function execution graph, and sequentially stack the statement feature vectors according to node indexes to obtain statement-level features; The path risk prompt message is as follows: wherein, the method comprises the steps of, Representing a path of execution of a function The node on the upper side of the node, The node type is indicated as such, As a preset weight for the node type, For the node risk hit indication quantity, Representing a set of preset risk trigger patterns, For the risk trigger pattern weight, Execution path for function Hit indication amount corresponding to the risk trigger mode; The path novelty is: Wherein P is A set of execution paths for the selected candidate, a function execution path The three kinds of information of (1) are respectively a branch condition set Calling information sets And variable access information set Function execution path The three kinds of information of (1) are respectively a branch condition set Calling information sets Variable access information set Path similarity Weight of , Is Jacquard similarity.
7. A source code high order vulnerability detection system based on execution path analysis according to claim 6, wherein the nodes of the function execution graph are used to represent sentences, sentence blocks or continuous basic code fragments, and associate the source code position information, variable access information and node type.
8. The system of claim 7, wherein the bounded constraint comprises a maximum path length, a maximum number of loop expansions, and a maximum call depth.

Description

Source code high-order vulnerability detection method and system based on execution path analysis Technical Field The invention belongs to the technical field of software security, and particularly relates to a source code high-order vulnerability detection method and system based on execution path analysis. Background The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art. Along with the improvement of the scale and complexity of a software system, the source code loopholes have the characteristics of complex triggering conditions, long propagation paths, remarkable inter-stage dependence and the like. In particular, so-called "high-order loopholes" are often not triggered directly by a single statement or local defect within a single function, but rather by external inputs that trigger security risks (e.g., command execution, out-of-bound writing, injection, permission bypass, etc.) after undergoing multiple conditional branches, loop iterations, function calls, and read and write of state-persistence media (e.g., global/static variables, caches, configuration files, databases, message queues, etc.) in a program, reaching a sensitive operating point (sink) at a later stage. The loopholes have the characteristics of cross-stage propagation, cross-position triggering, difficult tracing of link fracture and the like, so that the traditional detection method faces obvious challenges. The existing source code vulnerability detection technology mainly comprises static scanning based on rule/pattern matching, static analysis based on data flow/stain analysis, path exploration based on dynamic execution or symbol execution, code representation learning based on machine learning/deep learning and the like. The rule or pattern matching method is simple to realize and high in efficiency, but has insufficient depicting capability on complex semantics and cross-sentence dependence, and is easy to generate missing report or false report. Although the stain analysis and the traditional data flow analysis can describe partial propagation relationship, the stain analysis and the traditional data flow analysis are easily affected by problems of path number explosion, cross-function/cross-module modeling difficulty, information chain breakage caused by state persistence and the like when facing to real engineering codes. The dynamic execution and symbol execution method can obtain more accurate execution information, but the method generally has the defects of high path exploration cost, strong environment dependence, limited suitability for large-scale codes and the like. In addition, although the learning type detection method in recent years improves generalization capability to a certain extent, many methods rely more on local token (token/token) features of sentences or functions, lack structural explicit modeling on high-order trigger modes such as "possible execution sequence", "cross-branch path difference", "store (write/persistence) →load →sink (sink), and meanwhile, when the code scale is large or the control flow is complex, if path coverage is simply pursued, the number of candidate paths tends to increase sharply, the calculation cost is increased, and the attention of effective risk paths is reduced, so that the detection effect and the positioning interpretability are affected. Disclosure of Invention In order to solve the technical problems in the background art, the invention provides a source code high-order vulnerability detection method and a system based on execution path analysis, which are used for realizing effective depiction of high-order vulnerability triggering links propagated by cross branches, cross loops, cross functions and cross persistence media by explicitly constructing a function execution diagram at a source code layer and generating a function execution path set with priority risk and combining cross-stage state media read-write association modeling and path level semantic learning, thereby improving vulnerability detection accuracy and positioning interpretability under controllable computing expense. In order to achieve the above purpose, the present invention adopts the following technical scheme: A first aspect of the present invention provides a method for detecting a source code high-order vulnerability based on execution path analysis, comprising: Acquiring source codes to be analyzed, and constructing a function execution graph for each function in the source codes, wherein the edges of the function execution graph are used for representing association relations in the program execution process, and the association relations comprise execution sequence relations, function call relations, variable propagation relations and writing and reading relations of state persistence media; On the function execution graph, starting from a function entry node, generating a function execution path