US-20260127268-A1 - SECURITY ACTION BASED ON ANOMALY DETECTION USING AI MODEL PROFILES AND USER PROFILES

US20260127268A1US 20260127268 A1US20260127268 A1US 20260127268A1US-20260127268-A1

Abstract

Techniques are described herein that are capable of performing a security action based on anomaly detection using AI model profiles and user profiles. AI model profiles (e.g., a model-session profile, a model-response profile, and/or a model-response profile) associated with AI model(s) are generated. User profiles (e.g., user-session profiles, user-prompt profiles, and/or user-response profiles) associated with users of the AI model(s) are generated. A security action is performed with regard to an incoming AI prompt as a result of a difference between the incoming AI prompt and one or more of the AI model profiles and/or one or more of the user profiles being greater than or equal to a difference threshold.

Inventors

Aviv SHITRIT
Roee Oz
Idan Hen
Tamer Salman
Alon DANOCH
Ron KELLER
Asaf Harari

Assignees

MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date: 20260507
Application Date: 20241106

Claims (20)

1 . A system comprising: a processor system; and a memory that stores computer-executable instructions that are executable by the processor system to at least: generate a model-session profile that represents semantic meanings of model sessions of an artificial intelligence (AI) model, a model session comprising a first subset of AI prompts received by the AI model during the model session and a first subset of AI responses generated by the AI model in response to the first subset of the AI prompts; generate a model-response profile that represents semantic meanings of the AI responses; generate user-session profiles that represent semantic meanings of user sessions of users with regard to the AI model, a user session comprising a second subset of the AI prompts that is received from a user at the AI model during the user session and a second subset of the AI responses that is generated by the AI model in response to the second subset of the AI prompts; generate user-prompt profiles for the users, a user-prompt profile representing a semantic meaning of a third subset of the AI prompts that is received from a user at the AI model; and trigger execution of an instruction, which causes a security action to be performed with regard to an incoming AI prompt, as a result of a difference between the incoming AI prompt and the model-session profile, the model-response profile, at least a subset of the user-session profiles, or at least a subset of the user-prompt profiles being greater than or equal to a difference threshold.
2 . The system of claim 1 , wherein the computer-executable instructions are executable by the processor system to at least: assign anomaly scores to the incoming AI prompt, the anomaly scores representing differences between the incoming AI prompt and the model-session profile, the model-response profile, at least the subset of the user-session profiles, and at least the subset of the user-prompt profiles; and trigger the execution of the instruction, which causes the security action to be performed with regard to the incoming AI prompt, as a result of an anomaly score, which is comprised in the anomaly scores, being greater than or equal to an anomaly score threshold.
3 . The system of claim 2 , wherein the anomaly scores correspond to the differences between the incoming AI prompt and a normal distribution of the model sessions represented by the model-session profile, a normal distribution of the AI responses represented by the model-response profile, a normal distribution of the user sessions represented by at least the subset of the user-session profiles, and a normal distribution of subsets of the AI prompts represented by at least the subset of the user-prompt profiles.
4 . The system of claim 3 , wherein the computer-executable instructions are executable by the processor system further to at least: generate the normal distribution of the model sessions represented by the model-session profile, the normal distribution of the AI responses represented by the model-response profile, the normal distribution of the user sessions represented by at least the subset of the user-session profiles, and the normal distribution of subsets of the AI prompts represented by at least the subset of the user-prompt profiles using a one-class classifier.
5 . The system of claim 2 , wherein the computer-executable instructions are executable by the processor system to at least: classify the incoming AI prompt as an anomalous AI prompt as the result of the anomaly score, which is comprised in the anomaly scores, being greater than or equal to the anomaly score threshold; and trigger the execution of the instruction, which causes the security action to be performed with regard to the incoming AI prompt, as a result of the incoming AI prompt being classified as the anomalous AI prompt.
6 . The system of claim 1 , wherein the computer-executable instructions are executable by the processor system to at least: generate user-response profiles for the users, a user-response profile representing a semantic meaning of a third subset of the AI responses that is generated by the AI model in response to a subset of the AI prompts that is received from a user at the AI model; and trigger the execution of the instruction causes the security action to be performed with regard to the incoming AI prompt further as a result of a difference between the incoming AI prompt and a user-response profile in the user-response profiles being greater than or equal to a second difference threshold.
7 . The system of claim 1 , wherein the computer-executable instructions are executable by the processor system to at least: generate a model-session feature vector, which is comprised in the model-session profile, by embedding at least a subset of the model sessions; generate a model-response feature vector, which is comprised in the model-response profile, by embedding at least a subset of the AI responses; generate user-session feature vectors, which are comprised in the user-session profiles, by embedding respective second subsets of the AI prompts that are received from the users at the AI model during the user sessions and respective second subsets of the AI responses that are generated by the AI model in response to the respective second subsets of the AI prompts; generate user-prompt feature vectors, which are comprised in the user-prompt profiles, by embedding respective third subsets of the AI prompts that are received from the users at the AI model; and trigger the execution of the instruction, which causes the security action to be performed with regard to the incoming AI prompt, as a result of a difference between an embedding that represents the incoming AI prompt and the model-session feature vector, the model-response feature vector, a first identified feature vector that represents at least a subset of the user-session feature vectors, or a second identified feature vector that represents at least a subset of the user-prompt feature vectors being greater than or equal to the difference threshold.
8 . The system of claim 7 , wherein the model-session feature vector is an embedding of a single model session from the model sessions; and wherein the model-response feature vector is an embedding of a single AI response from the AI responses.
9 . The system of claim 7 , wherein the model-session feature vector is an embedding of an aggregation of the model sessions; and wherein the model-response feature vector is an embedding of an aggregation of the AI responses.
10 . The system of claim 7 , wherein the first identified feature vector represents a single user-session feature vector in the user-session feature vectors; and wherein the second identified feature vector represents a single user-prompt feature vector in the user-prompt feature vectors.
11 . The system of claim 7 , wherein the first identified feature vector represents an aggregation of the user-session feature vectors; and wherein the second identified feature vector represents an aggregation of the user-prompt feature vectors.
12 . The system of claim 7 , wherein the computer-executable instructions are executable by the processor system to at least: generate the model-session feature vector, the model-response feature vector, the user-session feature vectors, and the user-prompt feature vectors using a cross-lingual language model.
13 . A method implemented by a computing system, the method comprising: generating a model-session feature vector in a model-session profile, which represents semantic meanings of model sessions of artificial intelligence (AI) models, by embedding at least a subset of the model sessions, a model session comprising a first subset of AI prompts that is received by an AI model during the model session and a first subset of AI responses that is generated by the AI model in response to the first subset of the AI prompts; generating a model-response feature vector in a model-response profile, which represents semantic meanings of the AI responses that are generated by the AI models, by embedding at least a subset of the AI responses; generating user-session feature vectors in user-session profiles, which represent semantic meanings of user sessions of users with regard to the AI models, by embedding the user sessions, which comprise respective second subsets of the AI prompts that are received among the AI models from the users during the user sessions and respective second subsets of the AI responses that are generated in response to the respective second subsets of the AI prompts; generating user-prompt feature vectors in user-prompt profiles for the users by embedding respective third subsets of the AI prompts that are received among the AI models from the users; and triggering execution of an instruction, which causes a security action to be performed with regard to an incoming AI prompt, as a result of a difference between an embedding that represents the incoming AI prompt and the model-session feature vector, the model-response feature vector, a first identified feature vector that represents at least a subset of the user-session feature vectors, or a second identified feature vector that represents at least a subset of the user-prompt feature vectors being greater than or equal to a difference threshold.
14 . The method of claim 13 , wherein at least the subset of the user-session feature vectors comprises a single user-session feature vector from the user-session feature vectors; and wherein at least the subset of the user-prompt feature vectors comprises a single user-prompt feature vector from the user-prompt feature vectors.
15 . The method of claim 13 , wherein at least the subset of the user-session feature vectors comprises an aggregation of the user-session feature vectors; and wherein at least the subset of the user-prompt feature vectors comprises an aggregation of the user-prompt feature vectors.
16 . The method of claim 13 , wherein the model-session feature vector represents a normal distribution of the model sessions represented by the model-session profile; wherein the model-response feature vector represents a normal distribution of the AI responses represented by the model-response profile; wherein the user-session feature vectors represent normal distributions of respective subsets of the user sessions; and wherein the user-prompt feature vectors represent normal distributions of the respective third subsets of the AI prompts that are received among the AI models from the users.
17 . The method of claim 13 , further comprising: generating user-response feature vectors in user-response profiles for the users by embedding respective fourth subsets of the AI prompts that are received among the AI models from the users; wherein triggering the execution of the instruction causes the security action to be performed with regard to the incoming AI prompt further as a result of a difference between the embedding that represents the incoming AI prompt and a third identified feature vector that represents at least a subset of the user-response feature vectors being greater than or equal to a second difference threshold.
18 . The method of claim 13 , wherein the model-session feature vector is an embedding of an aggregation of the model sessions of the AI models.
19 . The method of claim 13 , wherein the model-response feature vector is an embedding of an aggregation of the AI responses that are generated by the AI models.
20 . The method of claim 13 , wherein the model-session feature vector, the model-response feature vector, the user-session feature vectors, and the user-prompt feature vectors are generated using a cross-lingual language model.

Description

BACKGROUND Artificial intelligence (AI) models may encounter security threats, such as cyberattacks and data breaches. For example, a malicious entity may attempt to infect an AI model with malware or gain unauthorized access to the AI model. In another example, a malicious entity may cause an AI model to generate undesirable (e.g., offensive) outputs by injecting undesirable information into data that the AI model uses to generate the outputs. Anomaly detection techniques may be used to detect security threats to AI models. An anomaly detection technique is configured to identify an unusual occurrence (e.g., an unusual pattern or behavior) with regard to a system. For instance, an anomaly detection technique that is used on an AI model may be configured to identify an atypical utilization of the AI model or an atypical output of the AI model. However, developing an anomaly detection technique for use on an AI model may be challenging. For example, the anomaly detection technique may require a comprehensive understanding of the AI model and behavior patterns of its users. A capability of the AI model to process various types of data and a relatively large quantity of data generated by the AI model may complicate development of the anomaly detection technique. Conventional anomaly detection techniques for AI models traditionally focus on detecting known malicious behavior in the AI models (e.g., jailbreak attempts) and have relatively low coverage. Accordingly, the conventional anomaly detection techniques may miss important signals and may require a rebuild of the AI models for each new security threat. SUMMARY Artificial intelligence (AI) is intelligence of a machine (e.g., a computing system) and/or code (e.g., software and/or firmware), as opposed to intelligence of a living creature (e.g., a human). An AI prompt indicates (e.g., specifies) a task that is to be performed by an AI model. Examples of an AI prompt include but are not limited to a zero-shot prompt, a one-shot prompt, and a few-shot prompt. A zero-shot prompt is a prompt for which the prompt and/or its corresponding contextual information, which are to be processed by the AI model, is not included in pre-trained knowledge of the AI model. A one-shot prompt is a prompt that includes a target prompt along with a single example prompt and a single example answer that is responsive to the single example prompt. The example prompt and the example answer provide guidance as to how the AI model is expected to respond to the target prompt. A few-shot prompt is a prompt that includes a target prompt along with multiple example prompts and multiple example answers that are responsive to the respective example prompts. The example prompts and the example answers provide guidance as to how the AI model is expected to respond to the target prompt. An AI prompt may be a natural language prompt. A natural language prompt is a prompt that is written in a natural language. A natural language is a human language that has developed through use and repetition. For instance, the natural language may have developed naturally without conscious planning or premeditation. Examples of a natural language include English, French, Spanish, and Mandarin. In an aspect, the natural language prompt is generated by a user (e.g., a human). In another aspect, the natural language prompt is generated by a computing system (e.g., an AI assistant that runs on the computing system). An AI prompt may not be written in a natural language. For instance, the AI prompt may include (e.g., be) computer code. The AI prompt may be any suitable sequence of characters that is capable of being interpreted by an AI model. An AI model is a model that utilizes artificial intelligence to generate an answer that is responsive to an AI prompt (a.k.a. prompt) that is received by the AI model. The AI model may be an artificial general intelligence model. An artificial general intelligence model is an AI model (e.g., an autonomous AI model) that is configured to be capable of performing any task that an intelligent being (e.g., a human) is capable of performing. In an example implementation, the artificial general intelligence model is capable of performing a task that surpasses the capabilities of an animal. A cyberattack is an attempt to cause harm to a system (e.g., an AI model). For instance, the harm may be an unauthorized or illegal access to the system. Examples of a cyberattack include but are not limited to a denial of service (DoS) attack, a distributed DoS (DDoS) attack, a man-in-the-middle (MITM) attack, a malware attack, a phishing attack, a ransomware attack, and a cross-site scripting (XSS) attack. A DoS attack is an attack that renders a system unable to respond to a legitimate service request by overwhelming resource(s) of the system. A DDoS attack is similar to a DoS attack but involves multiple (e.g., a vast array) malware-infected hosts that are controlled by a threat actor to cause