Search

CN-121997334-A - Code feature recognition and vulnerability analysis system based on convolutional neural network

CN121997334ACN 121997334 ACN121997334 ACN 121997334ACN-121997334-A

Abstract

The invention belongs to the technical field of network security, and discloses a code feature recognition and vulnerability analysis system based on a convolutional neural network, which comprises a data processing module, a feature extraction module, a code portrait generation module, a vulnerability analysis module and a visual display module, wherein the data processing module is used for processing a code image; according to the invention, through the convolutional neural network model, the system can extract the bottom features from the large-scale code library within a few minutes, and a precise component portrait is generated. Compared with the traditional method, the feature extraction efficiency is improved by 30%, and the component identification accuracy is up to more than 95%. The system can automatically associate the component portraits with the vulnerability database, and realizes quick matching and inventory generation of known vulnerabilities. Experiments show that the leak recognition speed of the system is 5 times faster than that of manual analysis, and the leak detection rate is reduced by more than 20%. The full-flow analysis from the source code to the binary file is supported, different types of input files are processed through lexical analysis and disassembly technology, the application range of the system is expanded, and the system is suitable for diversified software supply chain scenes.

Inventors

  • LONG FEI
  • XIA FAN
  • WEI XIAOYAN
  • ZHAO QINGYAO
  • Mei ziwei
  • Hou dai
  • CHEN CHEN
  • HU JUNGUO
  • WANG XIAORUI
  • LI YING
  • ZHENG LEI
  • ZHOU ZHENG
  • DONG LIANG
  • HUANG JUNDONG
  • ZHAN WEI
  • YU MINGYANG
  • MENG HAOHUA
  • DONG CHENXI
  • WU GENG
  • XIAO DONGLING
  • ZOU CHENGCHENG
  • LIU ZHONGPEI
  • JIN BO
  • Liang Hanghan
  • ZHA ZHIYONG
  • YU ZHENG
  • GAO FEI
  • CHEN JIALIN
  • ZHUANG YAN
  • XU HUAN

Assignees

  • 国网湖北省电力有限公司信息通信公司

Dates

Publication Date
20260508
Application Date
20251231

Claims (10)

  1. 1. The code feature recognition and vulnerability analysis system based on the convolutional neural network is characterized by comprising the following components: the system comprises a data processing module, a feature extraction module, a code portrait generation module, a vulnerability analysis module and a visual display module; the data processing module is connected with the code portrait generation module and is used for decompressing, partitioning and preprocessing the target source code or the binary file; the feature extraction module is connected with the code portrait generation module and is used for extracting bottom semantic features from the code fragments by adopting a convolutional neural network; The code portrait generation module is connected with the data processing module, the feature extraction module and the vulnerability analysis module and is used for generating a semantic portrait of the code component based on the extracted features; the vulnerability analysis module is connected with the visual display module and is used for associating a vulnerability database, identifying known vulnerabilities in the code components and outputting a vulnerability list; And the visual display module is connected with the vulnerability analysis module and is used for displaying the code component structure and the vulnerability distribution in a chart form.
  2. 2. The convolutional neural network-based code feature recognition and vulnerability analysis system of claim 1, wherein the data processing module: Decompressing a file: decompressing the source code file or the binary file; extracting resolvable content in the code file; Code blocking: partitioning the source code file according to a fixed line number, or slicing the binary file according to a fixed size; the purpose of the partitioning is to divide the whole file into a plurality of standardized fragments, so that the subsequent processing is facilitated; Data preprocessing: 1) Target file decompression and partitioning: After decompressing the target code file, dividing the code into fixed-length fragments according to a preset length tool: Wherein S is a divided code segment set, n is the number of code segments, si represents the ith code segment; ensuring that each code segment is non-overlapping; 2) Vector coding: the vectorization mapping f (x) is used to map the characters, identifiers and keywords in the code segment into a numerical vector: where X is the vectorized representation of the code segment si and d is the dimension of the feature vector.
  3. 3. The convolutional neural network-based code feature recognition and vulnerability analysis system of claim 1, wherein the feature extraction module: 1) Input layer: receiving vectorized code fragments generated by a preprocessing module, wherein the shape is nxd, the lengths of the fragments are several, and d is a characteristic dimension; 2) Convolution operation: The local features are extracted using a plurality of convolution kernels K sliding: Wherein, the For the convolved output, Is the convolution kernel size, b is the offset; 3) Activation function: using the ReLU activation function: providing a network with nonlinear characteristics; 4) Pooling operation: and key features are extracted through maximum pooling or average pooling dimension reduction, so that the data volume is reduced: 5) Outputting high-level characteristics: the multi-layer rolled and pooled results are flattened into a vector F representing the semantic features of each code segment.
  4. 4. The convolutional neural network-based code feature recognition and vulnerability analysis system of claim 1, wherein the code representation generation module: 1) And (3) feature clustering: all extracted segment feature vectors f= { F1, F2,., fn } clustering to match features in the library of known components: Ck={Fj|Cluster(Fj)=k}, k {1,2,.,K} wherein Ck is the K-th clustering result, and K is the number of known components in the component library; 2) Component information matching: according to the clustering results, mapping each clustering result into specific component names and version numbers: Component_Info(Ck)={Name, Version}。
  5. 5. The convolutional neural network-based code feature recognition and vulnerability analysis system of claim 1, wherein the vulnerability analysis module: 1) Vulnerability matching: Querying vulnerability databases (such as CVE or NVD) based on component names and version numbers in the code representation V={CVE-ID, Severity,Description) 2) Generating a vulnerability list: Summarizing the matched vulnerabilities to generate a vulnerability list, wherein the vulnerability list comprises the following information: vulnerability ID (CVE-ID); Vulnerability description; a vulnerability influencing component; Repair advice; 3) Prioritization: and sorting the loopholes according to the severity of the loopholes, and suggesting to repair the high-risk loopholes preferentially.
  6. 6. The convolutional neural network-based code feature recognition and vulnerability analysis system of claim 1, wherein the visual presentation module: And (3) component structure display: using pie charts or tree charts to show component constituent proportions of code files; Displaying the name, version number and corresponding feature clustering result of each component; and (3) vulnerability distribution display: using a histogram or thermodynamic diagram to show the number and distribution of vulnerabilities; marking the components and the influence range of the high-risk loopholes; Vulnerability priority display: The severity and repair advice of the vulnerability are annotated by radar map or table.
  7. 7. A convolutional neural network-based code feature recognition and vulnerability analysis method for implementing the convolutional neural network-based code feature recognition and vulnerability analysis system of any one of claims 1-6, wherein the convolutional neural network-based code feature recognition and vulnerability analysis method comprises: Step 1, decompressing, blocking and preprocessing a target source code or a binary file through a data processing module; Step 2, extracting bottom semantic features from the code segments by adopting a convolutional neural network through a feature extraction module; Step 3, generating a semantic portrait of the code component based on the extracted features through a code portrait generation module; step 4, associating a vulnerability database through a vulnerability analysis module, identifying known vulnerabilities in the code assembly and outputting a vulnerability list; and 5, displaying the code component structure and the vulnerability distribution in a chart form through a visual display module.
  8. 8. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the convolutional neural network-based code feature recognition and vulnerability analysis method of claim 7.
  9. 9. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the convolutional neural network-based code feature recognition and vulnerability analysis method of claim 7.
  10. 10. An information data processing terminal, wherein the information data processing terminal is configured to implement the convolutional neural network-based code feature recognition and vulnerability analysis system according to any one of claims 1-6.

Description

Code feature recognition and vulnerability analysis system based on convolutional neural network Technical Field The invention belongs to the technical field of network security, and particularly relates to a code feature recognition and vulnerability analysis system based on a convolutional neural network. Background Traditional code analysis tools have difficulty capturing high-level semantic features of complex codes, and particularly in the case of highly modular or confusing code fragments, cannot effectively extract underlying features and component relationships. Because of the diversity of code structures and naming schemes, the recognition of components in the prior art often depends on manual rules or fuzzy matching, and misjudgment is easy to occur. The manual investigation of known vulnerabilities in code is inefficient, time consuming, and prone to missing critical risks. Conventional code analysis tools often lack visual support and it is difficult for users to understand complex analysis results. Through the above analysis, the problems and defects existing in the prior art are as follows: (1) Code complexity and semantic extraction difficulty: Traditional code analysis tools have difficulty capturing high-level semantic features of complex codes, and particularly in the case of highly modular or confusing code fragments, cannot effectively extract underlying features and component relationships. (2) Component identification is inaccurate: Because of the diversity of code structures and naming schemes, the recognition of components in the prior art often depends on manual rules or fuzzy matching, and misjudgment is easy to occur. (3) The vulnerability recognition efficiency is low: the manual investigation of known vulnerabilities in code is inefficient, time consuming, and prone to missing critical risks. (4) Lack of visual analysis tools: conventional code analysis tools often lack visual support and it is difficult for users to understand complex analysis results. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides a code feature identification and vulnerability analysis system based on a convolutional neural network. The invention is realized in such a way that a code feature recognition and vulnerability analysis system based on a convolutional neural network comprises: the system comprises a data processing module, a feature extraction module, a code portrait generation module, a vulnerability analysis module and a visual display module; the data processing module is connected with the code portrait generation module and is used for decompressing, partitioning and preprocessing the target source code or the binary file; the feature extraction module is connected with the code portrait generation module and is used for extracting bottom semantic features from the code fragments by adopting a convolutional neural network; The code portrait generation module is connected with the data processing module, the feature extraction module and the vulnerability analysis module and is used for generating a semantic portrait of the code component based on the extracted features; the vulnerability analysis module is connected with the visual display module and is used for associating a vulnerability database, identifying known vulnerabilities in the code components and outputting a vulnerability list; And the visual display module is connected with the vulnerability analysis module and is used for displaying the code component structure and the vulnerability distribution in a chart form. Further, the data processing module: Decompressing a file: decompressing the source code file or the binary file; extracting resolvable content in the code file; Code blocking: partitioning the source code file according to a fixed line number, or slicing the binary file according to a fixed size; the purpose of the partitioning is to divide the whole file into a plurality of standardized fragments, so that the subsequent processing is facilitated; Data preprocessing: 1) Target file decompression and partitioning: After decompressing the target code file, dividing the code into fixed-length fragments according to a preset length tool: Wherein S is a divided code segment set, n is the number of code segments, si represents the ith code segment; ensuring that each code segment is non-overlapping; 2) Vector coding: the vectorization mapping f (x) is used to map the characters, identifiers and keywords in the code segment into a numerical vector: where X is the vectorized representation of the code segment si and d is the dimension of the feature vector. Further, the feature extraction module: 1) Input layer: receiving vectorized code fragments generated by a preprocessing module, wherein the shape is nxd, the lengths of the fragments are several, and d is a characteristic dimension; 2) Convolution operation: The local features are extracted using a plurality of convolution k