CN-121980566-A - Model compiling security assessment method and system based on structure detection and differential analysis

CN121980566ACN 121980566 ACN121980566 ACN 121980566ACN-121980566-A

Abstract

The invention provides a model compiling security assessment method and system based on structure detection and differential analysis, and relates to the technical field of compiling security assessment, wherein the method comprises the steps of calling a compiler for compiling to generate intermediate representation and compiling products of each stage; the method comprises the steps of performing single file static detection on an intermediate representation diagram file, identifying a neural network back door mode embedded in a mathematical calculation structure form in a diagram, reading intermediate representation diagram files before and after operator fusion, performing differential analysis on the intermediate representation diagram files before and after the operator fusion, identifying abnormal operators or data streams injected in a fusion stage, performing differential analysis on a kernel source code file and a machine code disassembly file, identifying malicious tampering injected in a code generation or binary compiling stage, and performing final security assessment on a model compiling process by integrating detection results of a plurality of identification processes to generate a structured assessment report. The present disclosure improves the ability to identify unusual structures or potential back gate logic hidden in the model compilation phase.

Inventors

DENG ZIZHUANG
WANG JINKUN
CHEN YUXUAN
Zheng Aoying

Assignees

山东大学

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. The model compiling safety assessment method based on structure detection and differential analysis is characterized by comprising the following steps of: invoking a compiler to compile the deep learning model to generate intermediate representation and final compiling products of each compiling stage; reading a middle representation diagram file in a compiling stage to perform single file static detection, constructing an auxiliary analysis structure, and completing a first-level recognition process based on a neural network backdoor mode embedded in a mathematical calculation structure form in an auxiliary analysis structure recognition diagram; Reading intermediate representation files before and after operator fusion, performing differential analysis on the intermediate representation files before and after fusion, and identifying abnormal operators or data streams injected in the fusion stage to complete the identification process of a second level; Performing differential analysis on the kernel source code file and the machine code disassembly file, and identifying malicious tampering injected in a code generation or binary compiling stage to complete a third-level identification process; and synthesizing three layers of detection results, and carrying out final security evaluation on the model compiling process to generate a structured evaluation report.
2. The method for evaluating model compiling security based on structure detection and differential analysis according to claim 1, wherein the step of reading the intermediate representation graph file in compiling stage to perform single file static detection, constructing an auxiliary analysis structure, and identifying a neural network back door mode embedded in a mathematical calculation structure form in a graph based on the auxiliary analysis structure comprises the steps of reading a function-level calculation graph file, analyzing line by using a regular expression, and extracting all operator nodes and data dependency relations thereof in the calculation graph, wherein the method comprises the following steps: For each node row with the format of variable name=basic operator interface, operator name, extracting variable name, operator name and parameter list information in the row, and constructing a basic operator mapping table; Extracting variable names, operator names and parameter list information in node rows with the format of variable names = primitive operator interfaces; And analyzing the parameter list, and accurately identifying the nested parameter structure in the parameter list by tracking the nesting depth of brackets.
3. The method for evaluating the compiling safety of a model based on structure detection and differential analysis according to claim 2, wherein the construction of the auxiliary analysis structure is based on a neural network back door mode embedded in an auxiliary analysis structure recognition graph in a mathematical calculation structure form, and the method comprises the steps of variable alias mapping construction, full 1 tensor set recognition and activation gating variable mapping construction, and based on the auxiliary analysis structure, seven types of back door detectors are executed in parallel, the results recognized by the detectors are managed in a grading manner according to the confidence, and are classified into three grades of high, medium and low, and based on the confidence grading, comprehensive safety judgment is carried out according to a set rule.
4. The method for evaluating model compilation safety based on structure detection and differential analysis according to claim 1, wherein reading intermediate representation files before and after operator fusion, performing differential analysis on the intermediate representation files before and after fusion, and identifying abnormal operators or data streams injected in the fusion stage, comprises: reading the content of the intermediate representation graph file of the schedulers before and after operator fusion, carrying out structural analysis through a regular expression, and extracting key elements in a calculation graph, namely an operator set, a buffer zone set and external kernel mapping; respectively counting the quantity distribution of various operators in the intermediate representation graph file of the schedulers before and after fusion, and analyzing the structure change scale brought by fusion optimization, wherein the total quantity of operators and the total quantity of buffer areas; Performing set difference operation on operator sets of the middle representation map files before and after fusion, and calculating differences of middle buffer area sets before and after fusion; Executing set difference operation on the external kernel name sets in the intermediate representation diagram files before and after fusion; and synthesizing difference detection results of the operator, the buffer area and the external kernel, generating a structural analysis report, and identifying possible abnormal compiling behaviors.
5. The method for evaluating the security of model compilation based on structure detection and differential analysis according to claim 1, wherein the differential analysis of the kernel source code file and the machine code disassembly file, for identifying malicious tampering injected at the code generation or binary compilation stage, comprises: analyzing a source code file and a machine code file; constructing an interpretable constant set, and deducing a constant range legally appearing at a machine code layer based on constant information at a source code layer; executing the set eight detection rules based on the analysis result and the interpretable constant set; and (3) detecting and finding all rule triggers, de-duplicating by taking rule names and evidence character strings as compound keys, grouping the findings according to rule severity levels, and calculating confidence scores.
6. The method for evaluating model compilation security based on structure detection and differential analysis according to claim 1, wherein the step of synthesizing three levels of detection results, performing final security evaluation on the model compilation process, and generating a structured evaluation report comprises the steps of: the comprehensive evaluation is carried out by adopting set judgment logic, if the static detection of the middle representation form file of the high-level diagram finds a structural back door mode with high confidence coefficient or middle confidence coefficient, the back door is judged to have high risk, and the back door needs to be immediately and manually inspected; (2) If the difference analysis of the representation files before and after fusion finds that the number of non-fusion type abnormal operators or buffer areas is abnormally increased or an external kernel is newly added only before fusion, judging that suspicious structural changes exist in the compiling process, and checking a fusion stage; (3) If the kernel source code disassembly differential analysis finds serious or high risk level alarms, judging that the injection behavior exists at the machine code layer, and checking a code generation stage; (4) If no anomaly is found in all three layers of detection, the model compiling process is judged to be safe.
7. The model compiling safety evaluation system based on structure detection and differential analysis is characterized by comprising: The calling and compiling module is used for calling a compiler to compile the deep learning model and generating intermediate representation and final compiling products of each compiling stage; The first-level recognition module is used for reading the intermediate representation diagram file in the compiling stage to carry out single file static detection, constructing an auxiliary analysis structure, recognizing a neural network back door mode embedded in the diagram in a mathematical calculation structure form based on the auxiliary analysis structure, and completing the recognition process of the first level; The second-level recognition module is used for reading intermediate representation diagram files before and after operator fusion, carrying out differential analysis on the intermediate representation diagram files before and after fusion, recognizing abnormal operators or data streams injected in the fusion stage, and completing the second-level recognition process; The third-level identification module is used for carrying out differential analysis on the kernel source code file and the machine code disassembly file, identifying malicious tampering injected in a code generation or binary compiling stage and completing a third-level identification process; And the safety evaluation module is used for synthesizing the detection results of the three layers, carrying out final safety evaluation on the model compiling process and generating a structured evaluation report.
8. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the model compilation security assessment method based on structure detection and differential analysis according to any of claims 1-6.
9. A non-transitory computer readable storage medium storing computer instructions which, when executed by a processor, implement the structure detection and differential analysis based model compilation security assessment method according to any of claims 1-6.
10. An electronic device comprising a processor, a memory and a computer program, wherein the processor is connected to the memory, the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory, so that the electronic device executes a model compilation security assessment method based on structure detection and differential analysis according to any one of claims 1-6.

Description

Model compiling security assessment method and system based on structure detection and differential analysis Technical Field The disclosure relates to the technical field of compiling security assessment, in particular to a model compiling security assessment method and system based on structure detection and differential analysis. Background The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art. With the wide application of deep learning models in various intelligent systems, the safety problem of the models is gradually paid attention to. Recent researches show that an attacker can implant a back gate in a model training stage or a model deployment stage, so that the model is normally represented under normal input, and malicious output is generated under specific triggering conditions, thereby causing serious threat to system safety. Therefore, safety evaluation is performed during model deployment, and the identification of an abnormal computing structure or a potential back door mechanism possibly existing in the model is of great significance. In the prior art, the detection method aiming at the safety of the deep learning model mainly focuses on a model training stage or a model reasoning stage, for example, whether an abnormal response exists in the model is detected by means of input disturbance, activation analysis or model behavior analysis and the like. However, such methods typically rely on model running results or input-output behavior, lack direct analysis of the model internal structure, and difficulty in identifying back gate logic hidden in the model computing structure. Furthermore, in actual deployment, deep learning models typically require compiler transformation and optimization to generate multi-stage intermediate representations (INTERMEDIATE REPRESENTATION, IR) and underlying execution code, while existing security detection methods rarely take into account security risks in the model compilation process. In particular, the prior art still has the following limitations: (1) There is a lack of security detection mechanisms for IR structures in the model compilation process. In the compiling process of the deep learning model, the high-level computational graph can be gradually converted into various IR and is subjected to operator fusion, scheduling optimization and the like. However, most of the prior art only focuses on the original model structure or model reasoning behavior, and lacks a method for performing structural analysis on IR generated by compiling, so that it is difficult to discover abnormal computing modes or potential back gate logic hidden in an intermediate computing structure in time. (2) There is a lack of systematic validation methods for compile-stage product changes. During the model compilation process, there are typically structural changes between IR generated at different stages, such as operator fusion, scheduling rearrangements, or buffer adjustment, among other optimization actions. The prior art lacks an effective method for carrying out systematic differential analysis on the products of each stage before and after compiling, so that whether the structural change generated in the compiling process accords with the normal compiling flow or not is difficult to judge, and the potential abnormal structure is difficult to discover in time. (3) The lack of security analysis capability to execute code on the underlying computing cores and hardware. In modern deep learning compilation frameworks, portions of the computation may be further compiled into underlying execution code on a GPU or other hardware platform. Most of the prior art stays at a high-level model or an intermediate representation level, and lacks a method for performing structural analysis on the bottom-level execution code, so that malicious code or abnormal instructions possibly hidden in the bottom-level computing logic are difficult to detect. (4) There is a lack of a unified security assessment method across compilation levels. Deep learning models typically require multiple compilation stages from high-level computational graphs to final execution of code, with complex conversion relationships between different levels. The prior art is often only aimed at a single level, and lacks a security assessment mechanism capable of systematically analyzing the whole compiling process of the model, so that the potential security risk of the model in the compiling process is difficult to comprehensively identify. Disclosure of Invention In order to solve the above problems, the present disclosure provides a method and a system for evaluating model compilation safety based on structure detection and differential analysis, which utilize Intermediate Representation (IR) and final compilation products of each stage generated in the compilation process of a deep learning model, and systematically detect backd