CN-121996292-A - JavaScript code anti-confusion method based on AST abstract syntax tree
Abstract
The invention relates to a JavaScript code anti-confusion method based on an AST abstract syntax tree, belonging to the technical fields of software security, program analysis and code reverse engineering. The invention surpasses simple text matching, and can understand the intention of codes by carrying out semantic analysis on AST, thereby accurately restoring complex confusion transformation.
Inventors
- Ma Pubo
- LIU DONGMEI
- YANG TIANZHI
- LI MAN
- GAO WEIXIANG
- MA LIGUO
- YANG HUAN
- LIANG HAN
- MA LI
Assignees
- 西南技术物理研究所
Dates
- Publication Date
- 20260508
- Application Date
- 20251225
Claims (10)
- 1. A JavaScript code anti-confusion method based on an AST abstract syntax tree is characterized by comprising the following steps: Step S01, AST construction, namely, using a JavaScript parser to take the confused source code as input, and generating a complete and structured AST according to ECMAScript grammar specification, wherein the AST describes the grammar structure of the code, and each node represents a grammar unit; Step S02, AST traversal and semantic analysis, namely traversing AST by adopting a depth priority or breadth priority strategy, accessing each node in the traversing process, and applying a series of preset anti-confusion rules to carry out semantic analysis: unicode code restoration of character strings, namely restoring by deleting the raw node of an AST abstract syntax tree; The computable expression is used for directly calculating a result and replacing nodes for the computable expression; The control flow reverse flattening is that the execution sequence of the case code blocks is analyzed, the switch-case sentences are removed after the execution sequence is obtained, and the case code blocks are recombined in sequence; Dead code deletion, namely identifying code branches which cannot be executed and deleting the code branches; the character string is decrypted through a certain decryption function in a unified way, wherein the decryption function is called to decrypt the character string when the AST layer simulation is executed, and the restored real value is replaced with the original node; and S03, regenerating codes, namely reconstructing the AST after all the anti-confusion rules are applied, and finally converting the AST abstract syntax tree into an anti-confusion code text by using a code generator of an AST parser.
- 2. The method of claim 1, wherein the syntax element is a variable declaration, a function call, a ternary expression, an if expression, or a while expression.
- 3. The method of claim 1, wherein in the control flow anti-flattening process, for a case code block in a multi-layer switch-case nested form, there are a plurality of state variables, and a plurality of state variables have associated operation relations, and in this case, a constraint solving manner is adopted to reduce the plurality of state variables into a single state variable, reduce the multi-layer switch-case nest into a single layer, and then reduce the multi-layer switch-case nest into a single layer according to a single layer processing manner.
- 4. The method of claim 1, wherein the step of control flow inverse planarization comprises: identifying that a main loop containing a switch statement state variable is found; tracking the change of state variables and constructing a mapping relation diagram from a distributor to each basic block; And reconstructing, namely connecting the scattered basic blocks in a logic sequence according to the analyzed execution flow, replacing the flattening structure with a standard control sequence, and retaining a while statement.
- 5. The method of claim 1, wherein the JavaScript Parser is Acrn, esprima, or Babel Parser.
- 6. The method of claim 1, wherein the code generator is Escodegen.
- 7. The method converts the conversion problem of the code text into the reconstruction and pruning problems of the grammar tree structure, and performs semantic layer code analysis and conversion by using the abstract grammar tree.
- 8. A system for implementing the method of any one of claims 1 to 7.
- 9. Use of the method according to any of claims 1 to 7 in software security, program analysis and code reverse engineering.
- 10. Use of the system of claim 8 in software security, program analysis and code reverse engineering.
Description
JavaScript code anti-confusion method based on AST abstract syntax tree Technical Field The invention belongs to the technical field of software security, program analysis and code reverse engineering, and particularly relates to a JavaScript code anti-confusion method based on an AST abstract syntax tree. Background JavaScript is widely used as a dynamic and interpreted scripting language for Web front-end and back-end development, and in order to protect intellectual property or hide malicious code intent, developers commonly use code obfuscation techniques. Common JavaScript obfuscation techniques include: 1. the confusion of variable names/function names is that meaningful identifiers are replaced by nonsensical short characters or long characters, firstly literal meanings possibly carried by the variable names are removed, and secondly, the code space is increased, so that the code structure is visually examined, and the debugging difficulty is greatly increased. For example, userid, userName, secret, etc. are replaced with a, _0xa23433, etc. 2. String encryption, converting string words into function calls or array references decrypted at run-time. Such as obfuscator confusion techniques, by extracting all strings into a large array, then transforming the array into a confusing literal amount by shifting (shift, unshift) or the like, while designing a decryption function that decrypts at run-time. 3. Code structure confusion: 3.1 inserting invalid code blocks by setting false if branches, e.g. a 2+ b2 > 2ab, by the principle that some mathematical operation expressions hold constant, setting the expression of a branch to fe= (y= (oo=23) Oo)+(P=(qe=12>>qe)qe)>=(Y=2OoQe), and similarly a 2> =0, x &31< =31, etc. 3.2 Splitting and recombining expressions, namely replacing some functions such as call, comparison size, bit operation and the like by a function form, transferring operands as parameters of the functions, recombining expressions in the functions, and returning after calculating results. For example a < < b > replaced by the form of function m (t, n) { return t < < n }, m (a, b), func (p, q, d) replaced by the form of function r (n, s, z, l) { return n (s, z, l) }, r (func, p, q, d). 3.3 Control flow flattening-breaking up the otherwise linear or structured control flow (e.g., if-else, for loop) into a huge switch-case or if-else distributor, greatly reducing the readability and debuggeability of the code. At present, tools and methods for JavaScript anti-confusion mainly comprise: Regular expression replacement, namely performing simple character string replacement or decoding based on character string template matching. This approach has limited processing power, cannot cope with complex, context-dependent obfuscation techniques, and is prone to misinjuring normal code. The dynamic execution analysis is to actually run codes through a browser or node. Js environment and intercept the finally executed clear codes, but the mode is not necessarily capable of intercepting all the clear codes, and is only limited to a confusion mode using eval functions, and is only suitable for a specific confusion mode and is relatively limited. And the difficulty in dynamically debugging the confused codes and locating the positions of the key functions is high. Therefore, the prior art has the problems of insufficient processing depth, dependence on a runtime environment, high debugging difficulty and the like. There is an urgent need for an efficient scheme that can deep analyze and restore confusing code from the grammatical and semantic level in a static environment. Disclosure of Invention First, the technical problem to be solved The invention aims to provide an effective method for deeply analyzing and restoring the confusion codes from the grammar and semantic layers in a static environment. (II) technical scheme In order to solve the technical problems, the invention provides a JavaScript code anti-confusion method based on an AST abstract syntax tree, which comprises the following steps: Step S01, AST construction, namely, using a JavaScript parser to take the confused source code as input, and generating a complete and structured AST according to ECMAScript grammar specification, wherein the AST describes the grammar structure of the code, and each node represents a grammar unit; Step S02, AST traversal and semantic analysis, namely traversing AST by adopting a depth priority or breadth priority strategy, accessing each node in the traversing process, and applying a series of preset anti-confusion rules to carry out semantic analysis: unicode code restoration of character strings, namely restoring by deleting the raw node of an AST abstract syntax tree; The computable expression is used for directly calculating a result and replacing nodes for the computable expression; The control flow reverse flattening is that the execution sequence of the case code blocks is analyzed, the switch-case sentences are