US-12621317-B1 - Unsupervised anomalous access detection using sentence-based feature embeddings

US12621317B1US 12621317 B1US12621317 B1US 12621317B1US-12621317-B1

Abstract

Techniques for unsupervised anomalous access detection using sentence-based feature embeddings are described. Entity data describing users having access to a particular computing resource is obtained from a computing system and utilized, according to a sentence template, to construct descriptive sentences in a natural language format. The sentences are used to create dense vector embeddings via a sentence transformer machine learning (ML) model. The dense vector embeddings are used as input to an unsupervised anomaly detection ML model to detect anomalous users, which can be presented to an administrator.

Inventors

ChenMing XU

Assignees

AMAZON TECHNOLOGIES, INC.

Dates

Publication Date: 20260505
Application Date: 20240916

Claims (20)

1 . A computer-implemented method comprising: receiving a request, at an anomalous access service (AAS) of a cloud provider network, to determine whether a permission for a user to access a computing resource hosted in the cloud provider network is anomalous; generating, by the AAS, a string value based on a template and at least some metadata associated with the user, the string value comprising a natural language sentence, the natural language sentence including, from the metadata, at least a job title of the user and a geographic location of the user; transforming, by the AAS, the string value into a vector embedding based on use of a first machine learning (ML) model, wherein the first ML model is a sentence transformer model; determining, by the AAS, that the permission for the user to access the computing resource is anomalous, comprising providing the vector embedding as an input to a second ML model, wherein the second ML model comprises an unsupervised anomaly detection model that was trained based on a collection of embeddings corresponding to other users; causing, by the AAS, an indication to be presented, via a user interface, that the user having the permission to access the computing resource is anomalous; receiving, at the cloud provider network, a request to remove or disable the permission to access the computing resource for the user; and updating, by the cloud provider network, a permissions datastore to remove or disable the permission to access the computing resource for the user.
2 . The computer-implemented method of claim 1 , wherein determining that the permission to access the computing resource is anomalous comprises: obtaining an anomaly score as a result of providing the vector embedding as the input to the second ML model; and determining that the anomaly score meets or exceeds a threshold value.
3 . The computer-implemented method of claim 2 , wherein: the threshold value is a user-configured value provided by a different user; or the threshold value is determined as part of a training of the second ML model.
4 . A computer-implemented method comprising: receiving a request, at an anomalous access service (AAS) of a cloud provider network, to determine whether an access or permission involving a user for a computing resource hosted in the cloud provider network is anomalous; generating, by the AAS, a string value based on a template and at least some metadata associated with the user, the string value comprising a natural language sentence; transforming, by the AAS, the string value into a vector embedding based on use of a first machine learning (ML) model; determining, by the AAS, that the access or permission is anomalous, comprising providing the vector embedding as an input to a second ML model, wherein the second ML model was trained using other vector embeddings generated based on sentences corresponding to other users; and causing, by the AAS, an indication of the determination to be provided.
5 . The computer-implemented method of claim 4 , wherein determining that the access or permission is anomalous comprises obtaining an anomaly score as a result of providing the vector embedding as the input to the second ML model.
6 . The computer-implemented method of claim 5 , wherein determining that the access or permission is anomalous further comprises determining that the anomaly score meets or exceeds a threshold value.
7 . The computer-implemented method of claim 6 , wherein the threshold value is a user-configured value provided by a different user.
8 . The computer-implemented method of claim 6 , wherein the threshold value is determined as part of the training of the second ML model.
9 . The computer-implemented method of claim 4 , wherein the metadata comprises employee information of an organization, and wherein the method further comprises obtaining, by the AAS, the metadata from the electronic directory.
10 . The computer-implemented method of claim 9 , wherein the string value includes a textual description of one or more of: a job title of the user; a work location of the user; a job title of a manager of the user; a department name or department identifier of the user; or an amount of time that the user has been with the organization or within a subgroup of the organization.
11 . The computer-implemented method of claim 4 , wherein the metadata comprises computing activity metadata collected from one or more host computing devices of the cloud provider network.
12 . The computer-implemented method of claim 4 , wherein the request pertains to a proposed addition of the user to a group of an organization or a proposed configuration of permissions for the user.
13 . The computer-implemented method of claim 4 , wherein: the first ML model is a sentence transformer model; and the second ML model is an unsupervised anomaly detection model.
14 . The computer-implemented method of claim 4 , further comprising: selecting, by the AAS based on the request, at least one of the first ML model or the second ML model for use.
15 . A system comprising: a first one or more computing devices to implement a storage service in a multi-tenant cloud provider network, the storage service to store metadata associated with a user; and a second one or more computing devices to implement an anomalous access service (AAS) in the multi-tenant cloud provider network, the AAS including instructions that upon execution cause the AAS service to: receive a request to determine whether an access or permission involving the user for a computing resource hosted in the cloud provider network is anomalous; obtain the metadata from the storage service; generate a string value based on a template and at least some of the metadata associated with the user, the string value comprising a natural language sentence; transform the string value into a vector embedding based on use of a first machine learning (ML) model; determine that the access or permission is anomalous, comprising providing the vector embedding as an input to a second ML model, wherein the second ML model was trained using other vector embeddings generated based on sentences corresponding to other users; and cause an indication of the determination to be provided.
16 . The system of claim 15 , wherein to determine that the access or permission is anomalous the AAS is at least to obtain an anomaly score as a result of providing the vector embedding as the input to the second ML model.
17 . The system of claim 16 , wherein to determine that the access or permission is anomalous the AAS is at least further to determine that the anomaly score meets or exceeds a threshold value.
18 . The system of claim 17 , wherein the threshold value is a user-configured value provided by a different user.
19 . The system of claim 17 , wherein the threshold value is determined as part of a training of the second ML model.
20 . The system of claim 15 , wherein the string value includes a textual description of one or more of: a job title of the user; a work location of the user; a job title of a manager of the user; a department name or department identifier of the user; or an amount of time that the user has been with an organization or within a subgroup of the organization.

Description

BACKGROUND Organizations may use computing resources provided by cloud service providers in a variety of ways. Cloud services offer scalable, flexible, and cost-effective solutions for computing needs. Organizations can leverage Infrastructure as a Service (IaaS) to rent hardware like servers, storage, and networking technology. Platform as a Service (PaaS) provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure. Software as a Service (SaaS) delivers software applications over the internet on a subscription basis. These services eliminate the need for organizations to invest in and maintain their own costly information technology infrastructure. Additionally, cloud services provide the ability to scale resources up or down as needed, pay only for what is used, and access data and applications from anywhere, making them an attractive option for many users and organizations. In an organization, assigning different permissions to access cloud computing resources is a critical aspect of managing security and operational efficiency. This process is typically managed through a system known as Identity and Access Management (IAM). IAM systems allow administrators to assign specific roles to each user or group of users, where each role has a defined set of permissions. These permissions determine what resources a user can access and what actions they can perform. For instance, a database administrator might have full access to manage databases, while a software developer might only have permission to read data from certain databases. This role-based access control (RBAC) helps to ensure that users only have access to the resources they need to perform their job functions, reducing the risk of accidental or malicious misuse of sensitive data or critical systems. Additionally, many cloud service providers offer tools for managing these permissions at a granular level, allowing organizations to customize their security protocols to fit their specific needs. BRIEF DESCRIPTION OF DRAWINGS Various examples in accordance with the present disclosure will be described with reference to the drawings, in which: FIG. 1 is a diagram illustrating an environment for unsupervised anomalous access detection using sentence-based feature embeddings according to some examples. FIG. 2 is a diagram illustrating training an anomalous access engine used in an environment for unsupervised anomalous access detection using sentence-based feature embeddings according to some examples. FIG. 3 is a diagram illustrating unsupervised anomalous access detection for a new associated entity access request using sentence-based feature embeddings according to some examples. FIG. 4 is a diagram illustrating exemplary user-based processing for unsupervised anomalous access detection using sentence-based feature embeddings according to some examples. FIG. 5 is a diagram illustrating exemplary network-based processing for unsupervised anomalous access detection using sentence-based feature embeddings according to some examples. FIG. 6 is a flow diagram illustrating operations of a method for unsupervised anomalous access detection using sentence-based feature embeddings according to some examples. FIG. 7 illustrates an example cloud provider network environment according to some examples. FIG. 8 is a block diagram of an example cloud provider network that provides a storage service and a hardware virtualization service to users according to some examples. FIG. 9 is a block diagram illustrating an example computing device that can be used in some examples. DETAILED DESCRIPTION In organizations of all sizes, managing employee access to sensitive computing resources, such as production hosts and cloud accounts, is a difficult yet critical aspect of corporate security. However, identifying employees with unusual or unnecessary access privileges is challenging, especially when dealing with a vast number of employees, large numbers and/or types of computing resources, and complex access control systems. The present disclosure relates to methods, apparatus, systems, and non-transitory computer-readable storage media for unsupervised anomalous access detection using sentence-based feature embeddings. According to some examples, an anomalous access service can serve as a detection tool that leverages sentence embedding machine learning (ML) models to extract the meaning of user-related features along with an unsupervised learning ML model to identify users having unusual, or atypical, access permissions to computing resources such as production hosts or cloud accounts. By converting user-related features into embeddings using sentence transformers and applying unsupervised learning techniques, the anomalous access service can effectively detect anomalous access patterns and flag users who may potentially pose a security risk. Accordingly, examples disclosed herein can eff