KR-20260064335-A - OPERATIONAL SAFETY IN ARTIFICIAL INTELLIGENCE AGENT
Abstract
A method for operational safety in an Artificial Intelligence (AI) agent is disclosed, comprising: a step of processing a command input by a user to classify a user request action represented by the command; a step of calculating a risk score for a user request action based on context information regarding a current context associated with at least one of the user and the user request action (context information includes the result of classification); a step of making a decision on whether to allow, block, or request confirmation of the execution of the user request action by the AI agent by performing reinforcement learning (RL) based on the context information and the risk score; and a step of refining the decision based on the context information and the risk score, wherein the step of refining the decision comprises a step of calculating a first probability, a second probability, and a third probability based on the context information, the risk score, and the decision according to a probabilistic graphical model (the first probability is the probability of occurrence of allowance, the second probability is the probability of occurrence of blocking, and the third probability is the probability of occurrence of a confirmation request), and the first probability, the second probability and It includes a step of making a redecision on whether to perform an allow, block, or confirmation request based on a third probability.
Inventors
- 정철
- 아왈리 샤이쿨
Assignees
- 세종대학교산학협력단
Dates
- Publication Date
- 20260507
- Application Date
- 20241031
Claims (11)
- As a method for operational safety in artificial intelligence (AI) agents, A step of processing a command entered by a user and classifying a user-requested action indicated by said command, and A step of calculating a risk score for a user request action based on context information regarding a current context associated with at least one of the user and the user request action—the context information includes the result of the classification—and, A step of making a decision on whether to allow the execution of the user request action by the AI agent, block the execution, or request confirmation of the execution by performing reinforcement learning (RL) based on the context information and the risk score, and The method comprises a step of refining the decision based on the context information and the risk score, wherein the step of refining the decision comprises: a step of calculating a first probability, a second probability, and a third probability based on the context information, the risk score, and the decision according to a probabilistic graphical model—wherein the first probability is the probability of the allowance occurring, the second probability is the probability of the blocking occurring, and the third probability is the probability of the request occurring—and a step of making a redecision on which of the allowance, the blocking, and the request to perform based on the first probability, the second probability, and the third probability. method.
- In paragraph 1, The processing of the above command includes running a Natural Language Understanding (NLU) model with the above command, method.
- In paragraph 1, The above context information further includes at least one of the user's role in the computing environment where the AI agent is operated, the user's behavioral pattern in the computing environment, the required execution time of the user request action, or the frequency at which the user requests that a user requestable action having a type to which the user request action belongs be executed by the AI agent. method.
- In paragraph 1, The above performance of the above RL includes using a Q-Learning model that selects one of a plurality of actions, including the allow, the block, and the request, based on the context information and a state representing the risk score. method.
- In paragraph 4, The above Q learning model receives feedback as a reward for learning the above Q learning model the result of the allowance, the blocking, or the request being performed according to the above redecision. method.
- As a device for operational safety in Artificial Intelligence (AI) agents, A natural language processing classifier that processes a command entered by a user and classifies user request actions indicated by the command, and A dynamic risk scoring module that calculates a risk score for a user request action based on context information regarding a current context associated with at least one of the user and the user request action—the context information includes the result of the classification—and, A behavior learning module that performs reinforcement learning (RL) based on the above context information and the above risk score to make a decision on whether to allow the execution of the user request action by the AI agent, block the execution, or request confirmation for the execution, and A dynamic decision module that refines the decision based on the context information and the risk score, wherein the refinement of the decision comprises calculating a first probability, a second probability, and a third probability based on the context information, the risk score, and the decision according to a probabilistic graphical model—wherein the first probability is the probability of the allowance occurring, the second probability is the probability of the blocking occurring, and the third probability is the probability of the request occurring—and making a redecision on which of the allowance, the blocking, and the request to perform based on the first probability, the second probability, and the third probability. device.
- In paragraph 6, The processing of the above command includes running a Natural Language Understanding (NLU) model with the above command, device.
- In paragraph 6, The above context information further includes at least one of the user's role in the computing environment where the AI agent is operated, the user's behavioral pattern in the computing environment, the required execution time of the user request action, or the frequency at which the user requests that a user requestable action having a type to which the user request action belongs be executed by the AI agent. device.
- In paragraph 6, The above performance of the above RL includes using a Q-learning model to select one of a plurality of actions, including the allow, the block, and the request, based on the context information and a state representing the risk score. device.
- In Paragraph 9, The above Q learning model receives feedback as a reward for learning the above Q learning model the result of the allowance, the blocking, or the request being performed according to the above redecision. device.
- A computer-readable storage medium storing computer-executable instructions that, when executed by a computer processor, cause the computer processor to perform the method described in any one of claims 1 to 5.
Description
Operational Safety in Artificial Intelligence Agents The present disclosure relates to operational safety in artificial intelligence (AI) agents. An AI agent acts as a robot (i.e., a system with a certain degree of autonomy) within cyberspace to execute tasks on behalf of its user. To understand user commands, the agent sends input prompts as requests to underlying AI models, such as Large Language Models (LLMs). The response generated by such models may include the agent's final action or another command. To execute such user-requested actions, the agent invokes tools capable of performing local operations or making requests to a remote host (e.g., querying a search engine). Existing research and development on AI agents has failed to consider the potential vulnerabilities of the agent. In traditional computing systems, security is maintained by the following three attributes: confidentiality, integrity, and availability. Each of these attributes faces a unique challenge in the use of AI agents (He, Y., Wang, E., Rong, Y., Cheng, Z., & Chen, H. (2024). Security of AI Agents. arXiv preprint arXiv:2406.08689). In conventional systems, confidentiality is maintained through access control policies (which manage who can view or use which information). However, there are new risks associated with AI agents, particularly those using LLMs. LLMs can store and compress large amounts of training data, which can lead to privacy leaks when these agents interact with tools or users. As AI agents become able to read and use tools via commands, the complexity of keeping information confidential increases. Consequently, there is a need to reconsider how confidentiality should be managed, requiring new safeguards to address potential risks of privacy infringement, particularly when sensitive data is requested or processed by AI systems. Integrity is another critical aspect of security, meaning ensuring that data remains accurate, complete, and unaltered by unauthorized individuals. In AI systems, maintaining this integrity is particularly difficult because AI agents interact with both users and tools through prompts. Since AI agents perform actions on behalf of the user but are not the user themselves, traditional methods for guaranteeing integrity are difficult to apply entirely. This discrepancy creates vulnerabilities in verifying the accuracy and reliability of data when AI agents execute tasks or provide information. Since AI agents execute commands on behalf of the user to perform tasks, threats to availability (meaning that the system and data must be accessible when necessary) must also be considered. While LLMs can only output text tokens, AI agents execute actions that can affect the computing system itself. Therefore, such executions could potentially harm the availability of both the AI agent's host system and tools by executing malicious commands generated by the LLM. Zhang, C., Li, L., He, S., Zhang, X., Qiao, B., Qin, S., ... & Zhang, Q. (2024). Ufo: A ui-focused agent for windows os interaction. arXiv preprint arXiv:2402.07939 deals with the protection of AI agents, which is a static approach relying only on predefined prompts, which may struggle to adapt to newly emerging behavioral patterns and potentially overlook subtle signs of changes in normal user behavior or unusual activity over time. Figure 1 shows an example of a computing environment in which an exemplary AI agent operates. Figure 2 shows an exemplary configuration of the dynamic protection system of Figure 1. Figure 3 is a flowchart showing an example of a process for operational safety in an AI agent. The various terms used in this disclosure are selected from the conventions of common terminology in consideration of their function within this document, as they may be perceived differently depending on the intent, practices, or emergence of new technology of those skilled in the art. In specific instances, some terms may be given meanings as set forth in the detailed description. Accordingly, terms used in this document should be defined consistently with their meaning in the context of this disclosure, rather than merely by their names. In this document, terms such as "include," "have," etc., are used to specify the presence of the elements listed below, e.g., certain features, numbers, steps, actions, components, information, or combinations thereof. Unless otherwise indicated, these terms and variations thereof are not intended to exclude the presence or addition of other elements. As used in this document, terms “first,” “second,” etc. are intended to identify several similar elements. Unless otherwise stated, such terms are not intended to impose limitations, such as a specific order of use of these elements or their elements, but are used merely to refer to several elements separately. For example, while an element may be referred to as the term “first” in one example, the same element may be referred to by a different ordinal number, su