CN-122020667-A - Method, device, processor and computer readable storage medium for realizing static analysis safety research, judgment and positioning based on artificial intelligence

CN122020667ACN 122020667 ACN122020667 ACN 122020667ACN-122020667-A

Abstract

The invention relates to a method for realizing static analysis safety research, judgment and positioning based on artificial intelligence, which comprises the following steps of calling a multi-source heterogeneous static analysis tool to obtain a multi-source heterogeneous analysis result, respectively carrying out format conversion and standardization processing on an output result of the multi-source heterogeneous static analysis tool, carrying out semantic analysis, false alarm suppression and thinking chain reasoning on standardized structure data based on a locally deployed large language model, carrying out aggregation processing on the analysis result, and displaying or deriving the content of the high-confidence result. The method, the device, the processor and the computer readable storage medium thereof for realizing static analysis safety research, judgment and positioning based on artificial intelligence have the advantages of obviously reducing false alarm rate, obviously improving the accuracy of key high-risk problems, greatly improving the positioning speed of the problems, enhancing the interpretability, facilitating safety audit and compliance recording, having strong suitability, supporting guarantee privacy and compliance, carrying out full-link localization reasoning, and effectively avoiding the source code leakage risk.

Inventors

SONG ZHENG
YANG ZHILIANG
Bi Zhizhi
JIN BO
Lin jiuchuan

Assignees

公安部第三研究所

Dates

Publication Date: 20260512
Application Date: 20260204

Claims (11)

1. The method for realizing static analysis safety research, judgment and positioning based on artificial intelligence is characterized by comprising the following steps: (1) Invoking a multi-source heterogeneous static analysis tool, and executing a scanning task according to a preset analysis rule to obtain a multi-source heterogeneous analysis result; (2) Respectively carrying out format conversion and standardization processing on the output result of the multi-source heterogeneous static analysis tool to obtain a standardized static analysis result; (3) Carrying out semantic analysis, false alarm suppression and thinking chain reasoning on the standardized structure data based on a locally deployed large language model, and outputting an analysis result and a confidence score; (4) Performing aggregation treatment on the analysis result to obtain high-confidence result content of the analysis object; (5) And displaying or exporting the high-confidence result content.
2. The method for implementing static analysis security research, judgment and positioning based on artificial intelligence according to claim 1, wherein the step (3) specifically comprises the following steps: (3.1) constructing and managing a false positive data set; (3.2) carrying out thinking chain reasoning and self-consistent inspection; (3.3) generating an evidence chain and carrying out light weight reachability verification; And (3.4) grading and sorting the problem risks to obtain a composite risk score.
3. The method for implementing static analysis security research, judgment and positioning based on artificial intelligence according to claim 2, wherein the step (3.1) is specifically as follows: And adopting a swinging case mining strategy to actively learn and manually label the use cases with ambiguity or misjudgment risks.
4. The method for implementing static analysis security research, determination and positioning based on artificial intelligence according to claim 2, wherein the step (3.2) specifically comprises the following steps: (3.2.1) applying a preset prompting template sequence in a locally deployed large language model, and carrying out multi-stage reasoning on each piece of alarm information to obtain a research and judgment result; and (3.2.2) carrying out self-consistency check on the research and judgment result, and outputting the research and judgment result and confidence score.
5. The method for implementing static analysis security research, determination and positioning based on artificial intelligence according to claim 2, wherein the step (3.3) specifically comprises the following steps: (3.3.1) outputting a complete evidence chain from the source to the destination according to the external input points and the call and data flow paths of the hazard function; (3.3.2) automatically generating the minimized utilization condition and the corresponding pseudo simulation utilization script, and verifying by a lightweight dynamic verification means.
6. The method for implementing static analysis security research, determination and positioning based on artificial intelligence according to claim 2, wherein the step (3.4) specifically comprises the following steps: And intelligently screening the target CWE types according to different AI analysis strategies and item categories to obtain risk priority ranking, and carrying out comprehensive ranking by classifying the priorities according to the strategies and combining confidence values.
7. The method for implementing static analysis security research, judgment and positioning based on artificial intelligence according to claim 1, wherein the step (4) specifically comprises the following steps: determining a CWE processing sequence according to the judging priority strategy; Locating the problem point positions of the same code row of the target file according to the same CWE type; And according to confidence strategies of different types of rules, combining confidence intervals, and outputting high-confidence result contents of the analysis objects.
8. A system for implementing static analysis security research and localization based on artificial intelligence implementing the method of claim 1, said system comprising: the heterogeneous tool calling module is used for calling a multi-source heterogeneous static analysis tool, executing an analysis task on a target and acquiring an output result; the multi-source result standardization module is connected with the heterogeneous tool calling module and is used for adapting the data results of different tools and converting the data results into a standardized format; The false alarm data set module is connected with the multi-source result standardization module and is used for collecting and accumulating typical cases and covering a typical static analysis tool false alarm mode; The thinking chain reasoning and self-consistency checking module is connected with the false alarm data set module and is used for carrying out thinking chain reasoning and self-consistency checking; The evidence chain generation and light weight reachability verification module is connected with the thinking chain reasoning and self-consistency checking module and is used for intelligently generating pseudo PoC and carrying out light weight dynamic verification; The problem risk classification and sequencing module is connected with the evidence chain generation and lightweight reachability verification module and is used for intelligently classifying and sequencing the problem risks with the loopholes of the research and judgment results under the strategy specified by the user requirements; The display and output module is connected with the problem risk classification and sequencing module and is used for displaying the analysis state, the static analysis result, the AI research result and the aggregate AI research result of the multi-source heterogeneous tool.
9. The utility model provides a device based on artificial intelligence realizes static analysis safety judgement and location which characterized in that, the device includes: A processor configured to execute computer-executable instructions; A memory storing one or more computer-executable instructions which, when executed by the processor, perform the steps of the method for implementing static analysis security research and localization based on artificial intelligence of any one of claims 1 to 7.
10. A processor for implementing static analysis security research and localization based on artificial intelligence, wherein the processor is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the method for implementing static analysis security research and localization based on artificial intelligence of any one of claims 1 to 7.
11. A computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method of any one of claims 1 to 7 for static analysis based security research and localization.

Description

Method, device, processor and computer readable storage medium for realizing static analysis safety research, judgment and positioning based on artificial intelligence Technical Field The invention relates to the field of software security, in particular to the technical field of artificial intelligence, and specifically relates to a method, a device, a processor and a computer readable storage medium for realizing static analysis security research, judgment and positioning based on artificial intelligence. Background 1. Static analysis technique In the field of software development, static analysis techniques (STATIC ANALYSIS) are techniques that automatically analyze source code without running a program. The method is mainly used for detecting potential problems in codes, such as grammar errors, security holes, inconsistent code styles, performance problems and the like. The static analysis technology is widely applied to the aspects of code quality assurance, security audit, code specification inspection and the like. The technical types of static Analysis include Syntax Analysis (syncax Analysis), semantic Analysis (SEMANTIC ANALYSIS), data flow Analysis (Data Flow Analysis), control flow Analysis (Control Flow Analysis), pattern matching (PATTERN MATCHING), type checking (TYPE CHECKING), and the like. Application scenarios of static analysis include code quality assurance (detection of style problems, repeated codes, excessive complexity, etc.), security hole detection (e.g., buffer overflow, SQL injection, XSS, CSRF, etc.), code specification inspection (e.g., google Code Style, PEP8, ESLint, etc.), and automated testing, code structure analysis and reconfiguration support, etc. The conventional static security test tool SAST can find potential defects in a large-scale code library, but the following problems are common: firstly, the false alarm rate is high, and semantics such as frame routing, template engine, ORM, reverse serialization, security encapsulation and the like cannot be effectively understood, so that a large number of 'possibility' or 'suspected' alarms are caused. Secondly, the operability is poor, the alarm lacks a trusted evidence chain of upstream and downstream data streams, and security personnel need to perform time-consuming manual backtracking and environment reproduction experiments. And thirdly, the importance is difficult to separate, namely the service exposure faces, availability and accessibility of different alarms are large in difference, and the rule weights of the traditional tools are difficult to embody the actual risks. Fourthly, the PHP and Java ecological frameworks are complex (such as Laravel, symfony, spring, myBatis, struts and the like), the SAST rule is difficult to adapt, and the cross-language and cross-framework models are unstable. Fifthly, the data closed loop is insufficient, namely, the experience of false alarm and true alarm is difficult to be precipitated into structural knowledge, and the false alarm rate cannot be continuously reduced. The existing improvement method is based on rule tuning, stain analysis enhancement or simple statistical learning post-processing, and still has difficulty in processing complex semantics, frame defining configuration and comprehensive judgment of cross-file or cross-layer data streams, and can not rapidly identify key available problems in a large-scale scanning result. Although commercial static analysis tools generally have strong semantic analysis capability, complex vulnerabilities (such as SQL injection, XSS, buffer overflow and the like) can be identified, vulnerabilities can be identified more accurately through semantic analysis and control flow analysis, and the false alarm rate is relatively low. However, under some complex scenes, false alarms may still occur, and a manual review is required. 2. Large language model The large language model (Large Language Model, LLM) is a natural language processing model based on the deep learning technology, and can learn the structure, the semantics and the context relation of the language through large-scale text data training, thereby realizing a plurality of natural language processing tasks such as text generation, question-answering, translation, abstract and the like. The core idea is that the model has wide language understanding and generating capability through the pre-training of mass data, and is adapted to specific tasks through fine tuning. The appearance of a large language model greatly promotes the progress of artificial intelligence in the field of natural language processing, and is widely applied to a plurality of fields such as intelligent customer service, content creation, data analysis and the like. The large language model has certain advantages in terms of vulnerability analysis and research, but also has obvious limitations. The method has the advantages that the method has strong natural language understanding capability, can efficient