CN-122020655-A - Robustness testing method for search enhancement generation type software vulnerability assessment system

CN122020655ACN 122020655 ACN122020655 ACN 122020655ACN-122020655-A

Abstract

The invention discloses a robustness testing method for a search enhancement generation type software vulnerability assessment system, and belongs to the technical field of software engineering and information security. The method comprises the following steps of S1, aiming at a vulnerability code to be evaluated, acquiring a related original example set by utilizing a retrieval module, S2, generating candidate variants which keep semantic consistency based on abstract syntax tree analysis, S3, selecting an countermeasure example set through a comprehensive scoring function and a beam searching strategy, and S4, constructing a prompt context by utilizing the countermeasure example to interfere vulnerability severity evaluation of a large language model, and particularly inducing high-risk/key vulnerability degradation to be misjudged as medium-risk/low-risk. The testing method can effectively reveal potential safety risks brought by the retrieval enhancement mechanism and is used for quantitatively evaluating the robustness of the vulnerability assessment system.

Inventors

CHEN XIANG
XU LUYAO
TIAN DAN
CHEN ZHIJIE
JIANG ZHENGZHENG
GAO ZHAN
JU XIAOLIN

Assignees

南通大学

Dates

Publication Date: 20260512
Application Date: 20251223

Claims (9)

1. A robustness testing method for a search enhancement generation type software vulnerability assessment system is characterized by comprising the following steps: step (1), example interception, namely acquiring an original example set formed by target vulnerability codes needing vulnerability severity assessment and the first k similar vulnerability codes returned by a retrieval module generated based on retrieval enhancement; The method comprises the steps of (2) generating a semantic-preserving variant based on abstract syntax tree analysis, namely carrying out abstract syntax tree analysis on each vulnerability code in an original example set, identifying identifiers in the abstract syntax tree analysis, carrying out importance evaluation based on semantic roles of the identifiers and correlation between the semantic roles and vulnerability key operations, selecting a replacement word from a semantic family dictionary by using the identifier with the highest score calculated by a formula, and renaming the replacement word so as to generate a plurality of candidate variants preserving the semantics; The method comprises the steps of (3) performing combined optimization based on beam search, namely constructing an countermeasure example set for an original example set from all candidate variants by adopting a beam search algorithm with the priority of the variants and taking a maximized comprehensive scoring function as an optimization target, wherein the comprehensive scoring function needs to comprehensively consider the semantic similarity of the candidate variants and target vulnerability codes and the diversity of the variant sets; And (4) constructing a prompt context based on the countermeasure example set, namely constructing the prompt context based on the countermeasure example set, and sending the prompt context to an inference module of the target large language model to interfere with the assessment of the vulnerability severity of the inference module, so that the inference module incorrectly assesses the actual severity level as a high-risk or critical vulnerability level as a medium-risk or low-risk severity level.
2. The method for testing the robustness of the search enhancement generation type software vulnerability assessment system according to claim 1, wherein the semantic-preserving variant generation based on abstract syntax tree analysis in the step (2) specifically comprises the following steps: Step (2-1), abstract syntax Tree parsing and identifier extraction, namely converting the vulnerability codes in the original example set into abstract syntax trees by using a programming language parser Tree-sitter, traversing the abstract syntax trees, extracting all variable identifiers and function name identifiers, and filtering identifiers or system API names affecting the naturalness of the codes; Step (2-2), identifier importance assessment, analyzing semantic roles and calculating importance scores thereof based on the context of the identifier in the abstract syntax tree ; Wherein, the Representing the frequency of occurrence of the identifier in the code segment, Representing the grammatical path distance of the identifier node in the abstract syntax tree to the most recently preset vulnerability sensitive operation node, And As a result of the weight coefficient being configurable, The value of (2) is 1, The value of (2); step (2-3) of rewriting the identifier based on the semantic family dictionary by setting the identifier with the highest score as the identifier with high importance from the semantic family dictionary Selecting a replacement word for renaming to generate a code variant maintaining the semantics, and a semantic family dictionary Is formed based on the semantic preserving synonyms obtained after attack scenes and programming habit analysis of the developer.
3. The method for testing the robustness of the search enhancement generation type software vulnerability assessment system according to claim 1, wherein the step (3) is based on the combined optimization of the beam search, and specifically comprises the following steps: step (3-1), variant pool construction, for the original example set Each of the examples of (1) Generating a set of candidate variants corresponding to the set based on the step (2) Forming a corresponding search space; step (3-2), initializing beam search and iterative optimization, wherein a variant-first beam search algorithm is adopted to maintain a size of The algorithm sequentially performs variant selection for the 1 st to k th example positions, and performs expansion, scoring and pruning operations on each layer; Step (3-3), path evaluation based on comprehensive scoring function, namely comprehensive scoring function based on pruning operation For evaluating candidate variants Joining a current partial path The expected utility of the latter, the function of which is set as: ; Wherein, the Representation variants Vulnerability code to be evaluated Semantic similarity in the embedding space, Representation variants Relative to the path The contribution of variants to the diversity of the code styles, Representation variants The modified concealment with respect to the original code, As a result of the weight coefficient being configurable, The value of (2) is 0.7, The value of (2) is 0.1, The value of (2) is 0.2; Step (3-4), generating the optimal countermeasure set, namely selecting a path with highest score after all the examples are processed, wherein the variant sequence corresponding to the path is Corresponding countermeasure example 。
4. The method for robustness testing of search enhancement generation type software vulnerability assessment system according to claim 2, wherein the semantic family dictionary in step (2-3) Through the construction of a hierarchical semantic role system, firstly classifying the identifier into basic semantic role major classes according to the context of the identifier in an abstract syntax tree, then subdividing the identifier into specific semantic subclasses according to the specific functions of the identifier in a vulnerability context, predefining corresponding synonym sets for each specific semantic subclass, wherein the basic semantic role comprises a data container class, a control variable class and a resource handle class, wherein the data container class can be subdivided into a buffer pointer and an array variable subclass, the control variable class can be subdivided into a cyclic index and a status flag bit subclass, and for the buffer pointer subclass, the constructed semantic family is that For the cyclic index subclass, the semantic family built is 。
5. The method for robustness testing of search enhancement generation type software vulnerability assessment system of claim 3, wherein the comprehensive scoring function In modifying concealment Calculated as candidate variants in the following manner Original example code corresponding thereto The inverse of the normalized edit distance between them is taken as a measure, i.e ; Wherein the method comprises the steps of To normalize the Levenshtein edit distance to a function of the [0,1] interval.
6. The method for robustness testing of search enhancement generation type software vulnerability assessment system of claim 3, wherein the comprehensive scoring function Style diversity contributions in (1) Calculated as follows, candidate variants are obtained Is a set of tokens of (a) With the current path Average set of tokens for existing variants in a document Calculate therebetween Distance as a measure of diversity, i.e 。
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program is executed to implement the steps of the method according to any of claims 1 to 6.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program is configured to implement the steps of the method of any one of claims 1 to 6 when called by a processor.
9. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 6.

Description

Robustness testing method for search enhancement generation type software vulnerability assessment system Technical Field The invention belongs to the technical field of software engineering and information security, and relates to a robustness testing method for a search enhancement generation type software vulnerability assessment system. Background As software supply chains grow in size and open source code multiplexing becomes increasingly popular, the efficiency of software vulnerability discovery and disposition becomes an important issue in the field of information security. In order to improve the vulnerability research and judgment efficiency, an automatic vulnerability assessment scheme based on a large language model appears in the prior art, wherein one common implementation manner is that in the assessment process, examples related to an object to be assessed are firstly retrieved from a historical vulnerability library, a code sample library or a knowledge base through a retrieval module, then the examples, the code fragments to be assessed and natural language descriptions are constructed into a prompt context together, the prompt context is input to an inference module, and a vulnerability severity assessment result (for example, a severity grade or a corresponding score) is output by the large language model. Although the evaluation flow of the search enhancement and the context learning can improve the utilization capability of the model to the vulnerability type and the disposal experience to a certain extent, the evaluation result has dependence on the content and the structure of the prompt context, and when the searched example deviates, is disturbed or is organized into the context which is unfavorable for evaluation, the reasoning module may generate severity judgment inconsistent with the real situation. Particularly in practical applications, if the evaluation result is used in links such as risk treatment priority, repair schedule or alarm distribution, the evaluation error may affect the treatment decision, so that systematic robustness test and quantitative evaluation are required for such systems. However, existing countermeasure test and robustness assessment methods for large language models or text/code generation systems tend to focus on directly perturbing the model input text or the code itself to be assessed or relying on specific model feedback signals, and lack a controllable perturbation and combination optimization test scheme oriented to an example set for the key links of 'after the retrieval module outputs and before the inference module inputs'. In addition, the example set is usually composed of a plurality of examples, how to select a representative countermeasure example set in a candidate disturbance space and realize repeatable quantitative evaluation is also lacking in a unified and operable realization path. Based on the above, it is necessary to provide a robust test method for a search enhancement generation type software vulnerability assessment system, so as to perform controllable disturbance and optimization selection on an example set obtained by search on the premise of not changing a vulnerability semantic target to be assessed, thereby revealing and quantifying the assessment stability of the system under an example dependent scene. Disclosure of Invention The invention aims to provide a robustness testing method for a search enhancement generation type software vulnerability assessment system, so that under the premise of not changing the semantics of an object vulnerability code to be assessed, controllable disturbance and combination optimization are carried out on an example set obtained by search after the output of a search module and before the input of an inference module, thereby disturbing vulnerability severity assessment of a large language model, and revealing and quantifying robustness risks of the system under an example dependent scene. In order to achieve the above purpose, the technical scheme adopted by the invention is that the robustness testing method for the search enhancement generation type software vulnerability assessment system is executed at the arrangement layer of the system, and an example set obtained by search is intercepted and disturbed after the output of a search module and before the input of an inference module based on the search enhancement generation, and the method comprises the following steps: (1) An example interception step of acquiring a target vulnerability code to be subjected to vulnerability severity evaluation and returning the target vulnerability code to the retrieval module An original example set of similar vulnerability codes. And (1-1) obtaining the target vulnerability code, namely obtaining the target vulnerability code which needs to be subjected to vulnerability severity assessment, and marking the target vulnerability code as the target vulnerability code. And (1-2) obtaining an exa