CN-122020673-A - Security vulnerability detection method and device based on intelligent agent and electronic equipment

CN122020673ACN 122020673 ACN122020673 ACN 122020673ACN-122020673-A

Abstract

The application discloses a security vulnerability detection method, a security vulnerability detection device and electronic equipment based on an intelligent agent, and relates to the field of artificial intelligence and the technical field of information security, wherein the method comprises the steps of extracting a feature vector representing a service security state from operation data and a transaction log of a service system; the method comprises the steps of inputting a feature vector into a reinforcement learning agent as state information of a service system, outputting a vulnerability detection action for the service system through the reinforcement learning agent, judging whether the service system has a security vulnerability in a current service scene according to an execution result of the vulnerability detection action, generating a detection result containing a vulnerability risk level, generating a reward value based on the detection result, and feeding the reward value back to the reinforcement learning agent to drive the reinforcement learning agent to update. The application solves the technical problem of low safety detection efficiency of financial business scenes in the prior art.

Inventors

WU YANGE
LIU WEI
LEI ZHUOMIN

Assignees

中国工商银行股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260213

Claims (11)

1. The security hole detection method based on the intelligent agent is characterized by comprising the following steps of: Extracting a feature vector for representing the service security state from operation data and a transaction log of a service system; Inputting the feature vector as state information of the service system to a pre-trained reinforcement learning intelligent agent, and outputting vulnerability detection action for the service system through the reinforcement learning intelligent agent, wherein the reinforcement learning intelligent agent is constructed based on a value function approximate network and is configured with a reward feedback mechanism corresponding to a service safety target; judging whether the business system has security holes in the current business scene according to the execution result of the hole detection action, and generating a detection result containing the hole risk level; Generating a reward value based on the detection result, and feeding back the reward value to the reinforcement learning agent to drive the reinforcement learning agent to update.
2. The method for detecting security vulnerabilities based on an agent of claim 1, wherein the reinforcement learning agent is trained by: initializing parameters of the value function approximation network and creating a buffer zone for storing training experience data; Constructing a simulation environment according to historical service data of the service system; The method comprises the steps of carrying out iterative training in the simulation environment, wherein each iterative training comprises the steps of determining exploration probability according to exploration rate parameters, selecting a target action corresponding to the value function approximate network from a plurality of preset vulnerability detection actions according to the exploration probability, executing the target action, collecting a reward value returned by the simulation environment and the latest state information of the service system after the target action is executed, forming experience data, storing the experience data into a cache area, and randomly sampling a batch of experience data from the cache area to update parameters of the value function approximate network; And when the fluctuation amplitude of the predicted value of the value function approximation network in a plurality of continuous training periods is lower than a first preset threshold value and the similarity of the selected target action sequence reaches a second preset threshold value, judging that the value function approximation network converges, and generating the reinforcement learning intelligent agent based on the converged value function approximation network.
3. The agent-based security breach detection method of claim 2, wherein said exploration rate parameter is configured as a monotonically decreasing function from an initial value to a final value, and wherein the value of said exploration rate parameter is dynamically calculated during training based on the number of iteration steps completed.
4. The agent-based security breach detection method of claim 2, wherein said determining of the prize value comprises the steps of: generating a first positive reward value when a security vulnerability defined by a preset rule as a first risk level is successfully identified; Generating a second positive reward value when a security vulnerability defined by a preset rule as a second risk level is successfully identified, wherein the first risk level is smaller than the second risk level, and the second positive reward value is smaller than the first positive reward value; Upon misidentification of a security breach, a negative prize value is generated.
5. The agent-based security breach detection method of claim 1, wherein said feature vector comprises at least one of the following feature components: the characteristic component of the user dimension at least comprises a risk level identification and account balance of the user account; The characteristic component of the product dimension at least comprises risk grade identification and transaction amount limit information of the financial product; The characteristic component of the transaction dimension at least comprises the transaction amount of the transaction and the accumulated transaction times of the day.
6. The agent-based security breach detection method of claim 1, wherein generating a reward value based on the detection result and feeding back the reward value to the reinforcement learning agent to drive an update of the reinforcement learning agent, comprises: After the vulnerability detection action is executed for the service system, acquiring the latest state information of the service system; The latest state information of the service system, the vulnerability detection action and the rewarding value are combined to form one item of experience data; And updating the reinforcement learning agent according to the target experience data.
7. The agent-based security breach detection method of claim 1, wherein after generating a detection result including a breach risk level, the agent-based security breach detection method further comprises: the information of the target security vulnerabilities with the vulnerability risk levels larger than the preset level is sent to an artificial review interface; And calibrating the reward value corresponding to the target security hole according to the confirmation result or the correction result returned by the manual review interface.
8. An agent-based security breach detection device, comprising: the feature extraction unit is used for extracting feature vectors used for representing the service security state from the operation data and the transaction log of the service system; The action determining unit is used for inputting the characteristic vector as the state information of the service system to a pre-trained reinforcement learning intelligent agent, and outputting a vulnerability detection action for the service system through the reinforcement learning intelligent agent, wherein the reinforcement learning intelligent agent is constructed based on a value function approximate network and is provided with a reward feedback mechanism corresponding to a service safety target; The action execution unit is used for judging whether the security vulnerability exists in the current service scene of the service system according to the execution result of the vulnerability detection action and generating a detection result containing the vulnerability risk level; and the feedback unit is used for generating a reward value based on the detection result and feeding back the reward value to the reinforcement learning intelligent agent so as to drive the reinforcement learning intelligent agent to update.
9. A computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and wherein the computer program, when executed, causes a device in which the computer readable storage medium is located to perform the method for detecting security vulnerabilities based on an agent as claimed in any one of claims 1 to 7.
10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the agent-based security breach detection method of any of claims 1-7.
11. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the agent-based security breach detection method of any of claims 1 to 7.

Description

Security vulnerability detection method and device based on intelligent agent and electronic equipment Technical Field The application relates to the field of artificial intelligence and the technical field of information security, in particular to a security vulnerability detection method and device based on an intelligent body and electronic equipment. Background Along with the acceleration of digital transformation in the financial industry, various business systems are increasingly complex, transaction modes and interaction logics are increasingly increased, and the traditional security detection means are difficult to effectively cope with. The security holes of the business logic level, such as unauthorized access, data tampering and the like, are identified efficiently and accurately, and are important to guaranteeing the security and stable operation of the financial business. In the related art, the safety detection is mainly performed by two types of technical schemes. One type is a rule-based matching tool scanning technique that performs matching analysis on system logs, codes, or traffic by predefined security rules or keywords. Another category is a supervised learning-based model detection technique that uses labeled historical vulnerability samples to train classification models for pattern recognition. However, these prior art solutions all have significant drawbacks. The tool scanning method is seriously dependent on expert experience and fixed rules, is difficult to cover dynamic and complex business logic scenes, and has low detection efficiency and high report missing rate. The supervised learning method is limited by the number and quality of labeling samples, and is difficult to adapt to the continuous change of the service environment due to insufficient detection accuracy caused by lack of deep analysis capability for the continuous evolution of novel attack modes and complex scenes in the financial service. Therefore, the safety detection efficiency of the financial service scene in the prior art is low, and the quick response and the deep coverage of a large-scale and high-dynamic service environment can not be realized while the high accuracy is ensured. In view of the above problems, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the application provides a security vulnerability detection method and device based on an intelligent agent and electronic equipment, which at least solve the technical problem of low security detection efficiency of financial business scenes in the prior art. According to one aspect of the embodiment of the application, a security vulnerability detection method based on an intelligent agent is provided, and the security vulnerability detection method comprises the steps of extracting a feature vector used for representing a service security state from operation data and a transaction log of a service system, inputting the feature vector as state information of the service system to a reinforcement learning intelligent agent which is trained in advance, outputting a vulnerability detection action for the service system through the reinforcement learning intelligent agent, wherein the reinforcement learning intelligent agent is constructed based on a value function approximation network and is provided with a reward feedback mechanism corresponding to a service security target, judging whether the service system has a security vulnerability in a current service scene according to an execution result of the vulnerability detection action, generating a reward value based on the detection result, and feeding the reward value back to the reinforcement learning intelligent agent to drive updating of the reinforcement learning intelligent agent. The reinforcement learning intelligent agent is obtained by initializing parameters of a value function approximation network, creating a buffer area for storing training experience data, constructing a simulation environment according to historical service data of a service system, performing iterative training in the simulation environment, wherein each iterative training comprises the steps of determining exploration probability according to exploration rate parameters, selecting target actions corresponding to the value function approximation network from a plurality of preset vulnerability detection actions according to the exploration probability, executing the target actions, collecting reward values returned by the simulation environment and latest state information of the service system after the target actions are executed, forming experience data, storing the experience data in the buffer area, randomly sampling a batch of experience data from the buffer area to update the parameters of the value function approximation network, and when the fluctuation amplitude of a predicted value of the value function approximation network in a plurality of continuous trainin