CN-120849239-B - Deep learning-based automatic mining method for firmware loopholes of Internet of things equipment
Abstract
The invention discloses an automatic mining method for firmware loopholes of Internet of things equipment based on deep learning, which comprises the following steps of S1, constructing an interface keyword set, S2, optimizing interface keyword weight, generating a dynamic weight interface keyword library, S3, analyzing a firmware binary file, positioning an interface function set, S4, executing static slice analysis, generating a slice path set, S5, executing symbols on the slice path, generating path constraint conditions, solving effective input data, S6, judging the type of loopholes by combining historical loophole characteristics, generating a loophole type identifier, S7, generating a context prompt word according to loophole metadata information, inputting a large language model, generating a loophole utilization code PoC, S8, executing PoC verification and completing automatic mining of loopholes. The method and the device realize the improvement of the automatic detection efficiency of the firmware loopholes, enhance the generating capability of the loopholes and are suitable for the safety detection scenes of the equipment of the multi-type Internet of things.
Inventors
- CHEN LEYAO
- Wu Dashan
- ZHANG ZHENYU
- LI CHANGJING
Assignees
- 连云港市公安局
Dates
- Publication Date
- 20260508
- Application Date
- 20250710
Claims (8)
- 1. The automatic mining method for the firmware vulnerabilities of the Internet of things equipment based on deep learning is characterized by comprising the following steps: S1, extracting a static file, performing static analysis on character string information and interface call information in the static file, and constructing an interface keyword set; s2, combining the interface features and the vulnerability features in the historical vulnerability sample, and executing weight optimization on the interface keyword set to form a dynamic weight interface keyword library; s3, analyzing the binary files of the device firmware based on the dynamic weight interface keyword library, and positioning an interface function set; S4, performing static slice analysis on each interface function in the interface function set, selecting a vulnerability risk function as an end function, extracting a control flow path and a data flow path which are transmitted to the vulnerability risk function by the interface function input variables, and generating a slice path set; s5, executing symbol execution analysis on each slice path in the slice path set, generating a corresponding symbol execution path and path constraint conditions, and solving effective input data; s6, judging the vulnerability type by combining the historical vulnerability sample information, generating a corresponding vulnerability type identifier, and taking a symbol execution path, a path constraint condition, effective input data, an interface function, a vulnerability risk function and the vulnerability type identifier as vulnerability metadata information; S7, generating a prompt word containing path context information according to the vulnerability metadata information, inputting the prompt word into a large language model subjected to history vulnerability sample fine tuning training, and generating vulnerability exploitation codes PoC for the vulnerability metadata information; S8, executing a vulnerability verification process of the PoC codes, verifying whether the vulnerability can be utilized or not, and completing vulnerability automatic mining.
- 2. The method for automatically mining firmware vulnerabilities of internet of things equipment based on deep learning according to claim 1, wherein the static analysis process in S1 comprises parsing HTML files, javaScript files and ASP files in front-end static files, extracting interface names, parameter names and call keywords in the files, screening the extracted interface names, parameter names and call keywords, generating corresponding interface call information sets, and constructing interface keyword sets by the interface call information sets.
- 3. The automated mining method for firmware vulnerabilities of internet of things devices based on deep learning according to claim 1, wherein the S2 specifically comprises: s21, carrying out semantic analysis on an interface call log, an interface definition file and an interface call context in a history vulnerability sample, and extracting an interface name, a parameter name, a call keyword and a call context fragment from the interface call log, the interface definition file and the interface call context to form interface characteristics; S22, performing static analysis and behavior trace analysis on a vulnerability report, a vulnerability exploiting code and a vulnerability triggering log in a historical vulnerability sample, and extracting vulnerability type identifiers, vulnerability triggering conditions, input features in the vulnerability exploiting code and path context information in the vulnerability type identifiers, the vulnerability triggering conditions and the vulnerability exploiting code to form vulnerability features; S23, aiming at each interface keyword in the interface keyword set Based on the extracted interface features and vulnerability features, calculating interface keywords Initial weights of (2) , wherein, According to interface keywords The occurrence frequency in the historical vulnerability samples and the severity level of the corresponding vulnerability samples are determined together; S24, introducing a time attenuation factor Initial weight is carried out according to time attribute of historical vulnerability sample Time weighting is carried out, and interface keywords are calculated Time-weighted weights of (2) Satisfies the following conditions ; S25, all interface keywords in the interface keyword set Time-weighted weights of (2) Normalization processing is carried out to generate normalized weights ; S26, according to the normalized weight And sequencing the interface keywords, screening out interface keywords with normalized weights larger than a preset threshold, and taking the screening result as a dynamic weight interface keyword library.
- 4. The automated mining method for firmware vulnerabilities of Internet of things equipment based on deep learning as set forth in claim 3, wherein the corresponding vulnerability sample severity level in step S23 comprises level identifiers determined according to vulnerability influence range, vulnerability exploitation difficulty and vulnerability disclosure time, the level identifiers are divided into three levels of high risk, medium risk and low risk, three corresponding level coefficients are respectively used, and the corresponding level coefficients and interface keywords are used in initial weight calculation Weighting calculation is carried out on the occurrence frequency of the (B) to obtain final initial weight 。
- 5. The automated mining method for firmware vulnerabilities of internet of things devices based on deep learning of claim 1, wherein S4 specifically comprises: s41, aiming at each interface function in the interface function set Extracting input variable set of interface function The input variable set The method comprises the steps of including interface parameters, global variables and context variables which are transmitted through a call chain; S42, using interface function Starting from each vulnerability risk function in the vulnerability risk function set For the end point, a control flow graph is constructed for the call path between the interface function and the risk function Data dependency graph ; S43 based on And (3) with For input variable set Performing forward data flow analysis, tracking variable propagation paths, identifying propagation of input variables to vulnerability risk functions Is a path node of (a); S44, synchronously executing backward data dependency analysis, and determining a vulnerability risk function Screening a key instruction node set and a corresponding data stream fragment from an upward tracing dependent path; S45, integrating the forward data flow analysis result and the backward data dependency analysis result to generate an interface function Vulnerability to risk function Is a slice path of (a) And all are combined Composing a set of slice paths 。
- 6. The automated mining method for firmware vulnerabilities of internet of things equipment based on deep learning according to claim 1, wherein the step S5 specifically comprises: s51, for slice path set Each static slice path in (a) Based on an instruction set architecture of firmware of the Internet of things equipment, a symbol execution environment is established, the symbol execution environment comprises a symbol variable modeling module, a symbol memory management module and a symbolized execution engine, and an interface function is initialized Input variable set of (2) Mapping the symbolized input variable set to a symbolized memory space for the symbolized input variable set; s52, along a static slicing path Loading corresponding instruction sequences according to the control flow sequence, and carrying out symbolization processing on each instruction by utilizing a symbolization execution engine, wherein the symbolization processing comprises the steps of converting operands in the instruction into a symbolized variable expression, continuously updating a symbolized memory state in a symbolized memory management module, and gradually generating a complete symbolized execution path ; S53, executing symbol execution path In the process, when a conditional branch instruction is encountered in an instruction stream, comparing operation in a conditional statement is dynamically analyzed, a conditional expression is extracted, symbol variables involved in the conditional expression are constructed as path constraint conditions, and all path constraint conditions accumulated in the path execution process form a path constraint set ; S54, collecting path constraints Inputting the path constraint set into a constraint solver, wherein the constraint solver is a solving tool based on an SMT theory, adopts a symbolic variable constraint solving algorithm, and aims at the path constraint set All path conditions in the network are solved in a combined way to obtain a meeting path constraint set Is effective in inputting data.
- 7. The automated mining method for firmware vulnerabilities of internet of things equipment based on deep learning according to claim 1, wherein the determining the vulnerability type in S6 specifically comprises: s61, extracting vulnerability type characteristics in a historical vulnerability sample; s62, executing a path aiming at each symbol, and extracting corresponding path characteristics, trigger condition characteristics and input characteristics; s63, comparing the extracted path characteristics, the triggering condition characteristics and the input characteristics with the vulnerability type characteristics in the historical vulnerability sample, and respectively calculating the characteristic similarity scores; s64, weighting and fusing the similarity scores of the various features, and calculating a total matching score; S65, judging according to the total matching score and a preset matching threshold, and selecting the vulnerability type with the matching score higher than the preset threshold as a vulnerability type judging result corresponding to the current symbol execution path.
- 8. The automated mining method for firmware vulnerabilities of internet of things devices based on deep learning according to claim 1, wherein the step S7 specifically comprises: s71, extracting vulnerability metadata information, wherein the vulnerability metadata information comprises a symbol execution path, a path constraint set, an effective input data set, an interface function, a vulnerability risk function and a vulnerability type identifier; S72, analyzing a path constraint set in the vulnerability metadata information, and extracting path context information, wherein the path context information comprises path control flow characteristics, branch condition characteristics and input variable characteristics; s73, constructing a context-aware prompt word template by combining path context information and vulnerability type identifiers, and generating a corresponding prompt word text; s74, inputting the prompt word text into a large language model which is subjected to fine tuning training through a history vulnerability sample, and executing an reasoning generation process; S75, generating the exploit code PoC aiming at the vulnerability metadata information by the large language model.
Description
Deep learning-based automatic mining method for firmware loopholes of Internet of things equipment Technical Field The invention relates to the technical field of program security, in particular to an automatic mining method for firmware loopholes of Internet of things equipment based on deep learning. Background With rapid progress of the internet of things technology and wide application of intelligent devices, devices such as routers, switches and the like become important components of modern communication hubs. The intelligent internet of things equipment improves life and working efficiency and simultaneously exposes a plurality of potential safety hazards. The security hole of the internet of things equipment becomes a new target of hacking, so that a series of security problems are caused, including data leakage, remote control, denial of service attack and the like. Therefore, vulnerability discovery of firmware in the internet of things equipment becomes a research hotspot in the field of network security. The security holes of the intelligent equipment of the Internet of things are deeply researched, and the holes are discovered and repaired, so that the method has important significance in guaranteeing network security, and maintaining user privacy and data security. The traditional vulnerability mining modes such as manual audit are low in efficiency, large amount of labor cost is required to be consumed, the risks of low accuracy and high false alarm rate are frequently caused, the intelligent equipment of the Internet of things is more in variety, and the code quantity is complex, so that key code parts to be analyzed cannot be efficiently positioned in the traditional method. Therefore, how to provide an automated mining method for firmware vulnerabilities of an internet of things device based on deep learning is a problem that needs to be solved by those skilled in the art. Disclosure of Invention In order to overcome the defects in the prior art, the invention provides an innovative method for automatically mining holes in firmware of Internet of things equipment. The technical scheme adopted by the invention comprises keyword-based entry information positioning, firmware code slice analysis, symbol execution vulnerability automatic triggering, vulnerability metadata-based large language model fine tuning and large language model-based automatic PoC generation. For keyword-based entry information positioning, relevant vulnerability attack entries of the intelligent equipment of the Internet of things are all located at interfaces of the front-end file by analyzing a large number of historical vulnerability discovers, and relevant functions in the binary file can be executed by starting from the interfaces of the front-end file, so that the method and the device provide for accurately positioning the key entry functions in the intelligent equipment of the Internet of things by identifying interface character strings in the front-end file. Meanwhile, by combining the historical vulnerability information, different weights are set for different keywords, and the effectiveness of extracting the key entry function is improved. By extracting the static file related to the front end in the firmware, the constant character strings existing in the static file are simply analyzed and extracted, the functional function related to the interaction of the front end interface is identified, and the actual method name of the back end processing is identified. These method names act as keywords for subsequent analysis in the binary file. After the keywords are obtained, specific processing functions are located in the binary file by the keyword names. In addition, the existing interface functions are found by finding the function table in the binary. To further enhance the integrity of the interface functions, more interface functions are located by finding the function in the firmware that gets the user input, thereby obtaining a set of interface functions. The method for extracting the entry functions in the different intelligent devices of the Internet of things by combining the historical vulnerability PoC and using the front-end interface and the parameter names can effectively improve the vulnerability mining efficiency. For the slice analysis of the firmware code, a series of vulnerability risk functions are designated from the interface function by adopting a slice analysis method, the vulnerability risk functions are used as final end functions, a forward data flow analysis method is used, whether input variables of the interface function flow into the end functions or not is determined in a stain tracking mode, and if the input variables flow into the end functions, the interface functions are potential risk functions. According to the vulnerability automatic triggering method based on the symbolic execution, constraint paths in the symbolic execution solving process are used by means of slice se