Search

US-12621260-B2 - Data protection in cloud data platform

US12621260B2US 12621260 B2US12621260 B2US 12621260B2US-12621260-B2

Abstract

A system is disclosed comprising a memory containing instructions and one or more computer processors. When the instructions are executed, the system performs an operation to configure a Domain Name System (DNS) proxy, executing in a node of a cloud data platform associated with a first account, to perform hostname resolution of an Account Host Identifier (AHID) of the first account. The DNS proxy receives a DNS request from a process executing in a pod of the node, and the system fails to resolve the DNS request if the name in the DNS request differs from the AHID of the first account. The system returns an Internet Protocol (IP) address if the name in the DNS request matches the AHID. The process executing in the pod of the node is configured to send data to data storage of the cloud data platform using the returned IP address.

Inventors

  • Derek Denny-Brown
  • AJAY SHRIDHAR JOSHI
  • Xuguang Yang
  • Haowei Yu
  • Thant Htoo Zaw

Assignees

  • SNOWFLAKE INC.

Dates

Publication Date
20260505
Application Date
20240911

Claims (14)

  1. 1 . A system comprising: a memory comprising instructions; and one or more computer processors, the instructions, when executed by the one or more computer processors, causing the system to perform operations comprising: configuring a Domain Name System (DNS) proxy, that is executing in a node of a cloud data platform associated with a first account, to perform hostname resolution of an Account Host Identifier (AHID) of the first account; receiving, by the DNS proxy, a DNS request from a process executing in a pod of the node; failing to resolve the DNS request in response to a name in the DNS request being different from the AHID of the first account; returning an Internet Protocol (IP) address in response to determining that the name in the DNS request matches the AHID, the process executing in the pod of the node being configured to send data to data storage of the cloud data platform using the IP address; configuring a filter in the node to check a hostname during a handshake to establish a connection by the process executing in the pod of the node, wherein the filter enables establishing the connection when the hostname in the handshake is the AHID of the first account; detecting, by the filter in the node, a connection request; and disabling, by the filter, the connection request in response to detecting that the connection request is for a hostname different from the AHID of the first account.
  2. 2 . The system as recited in claim 1 , wherein each account of the cloud data platform is associated with a unique AHID for the account.
  3. 3 . The system as recited in claim 1 , wherein the instructions further cause the one or more computer processors to perform operations comprising: configuring an IP checker, that is executing in the node, to perform network address translation (NAT) for the process executing in the pod of the node, wherein the IP checker resolves the IP address to an external IP address for storage associated with the first account in the cloud data platform.
  4. 4 . The system as recited in claim 3 , wherein configuring the IP checker further comprises: configuring rules for filtering packets based on IP address.
  5. 5 . The system as recited in claim 3 , wherein the instructions further cause the one or more computer processors to perform operations comprising: examining, by the IP checker, a packet sent by the process executing in the pod of the node; and dropping, by the IP checker, the packet in response to determining that an outgoing address in the packet is not associated with the first account.
  6. 6 . The system as recited in claim 5 , wherein the instructions further cause the one or more computer processors to perform operations comprising: translating, by the IP checker, the outgoing address to an external IP address in response to determining that the outgoing address in the packet is associated with the first account; and forwarding the packet to the external IP address.
  7. 7 . The system as recited in claim 1 , wherein the node of the cloud data platform comprises: the pod, the pod comprising the process and a first virtual network interface; a second virtual network interface coupled to the first virtual interface; a filter coupled to the second virtual interface; an IP checker coupled to the filter; and a network interface for network communications to devices outside the node.
  8. 8 . The system as recited in claim 7 , wherein the cloud data platform comprises a gateway endpoint coupled to the network interface of the node.
  9. 9 . A computer-implemented method comprising: configuring a Domain Name System (DNS) proxy, that is executing in a node of a cloud data platform associated with a first account, to perform hostname resolution of an Account Host Identifier (AHID) of the first account; receiving, by the DNS proxy, a DNS request from a process executing in a pod of the node; failing to resolve the DNS request in response to a name in the DNS request being different from the AHID of the first account; returning an Internet Protocol (IP) address in response to determining that the name in the DNS request matches the AHID, the process executing in the pod of the node being configured to send data to data storage of the cloud data platform using the IP address; configuring a filter in the node to check a hostname during a handshake to establish a connection by the process executing in the pod of the node, wherein the filter enables establishing the connection when the hostname in the handshake is the AHID of the first account; detecting, by the filter in the node, a connection request; and disabling, by the filter, the connection request in response to detecting that the connection request is for a hostname different from the AHID of the first account.
  10. 10 . The method as recited in claim 9 , wherein each account of the cloud data platform is associated with a unique AHID for the account.
  11. 11 . The method as recited in claim 9 , further comprising: configuring an IP checker, that is executing in the node, to perform network address translation (NAT) for the process executing in the pod of the node, wherein the IP checker resolves the IP address to an external IP address for storage associated with the first account in the cloud data platform.
  12. 12 . A machine-storage medium including instructions that, when executed by a machine, cause the machine to perform operations comprising: configuring a Domain Name System (DNS) proxy, that is executing in a node of a cloud data platform associated with a first account, to perform hostname resolution of an Account Host Identifier (AHID) of the first account; receiving, by the DNS proxy, a DNS request from a process executing in a pod of the node; failing to resolve the DNS request in response to a name in the DNS request being different from the AHID of the first account; returning an Internet Protocol (IP) address in response to determining that the name in the DNS request matches the AHID, the process executing in the pod of the node being configured to send data to data storage of the cloud data platform using the IP address; configuring a filter in the node to check a hostname during a handshake to establish a connection by the process executing in the pod of the node, wherein the filter enables establishing the connection when the hostname in the handshake is the AHID of the first account; detecting, by the filter in the node, a connection request; and disabling, by the filter, the connection request in response to detecting that the connection request is for a hostname different from the AHID of the first account.
  13. 13 . The machine-storage medium as recited in claim 12 , wherein each account of the cloud data platform is associated with a unique AHID for the account.
  14. 14 . The machine-storage medium as recited in claim 12 , wherein the machine further performs operations comprising: configuring an IP checker, that is executing in the node, to perform network address translation (NAT) for the process executing in the pod of the node, wherein the IP checker resolves the IP address to an external IP address for storage associated with the first account in the cloud data platform.

Description

TECHNICAL FIELD The subject matter disclosed herein generally relates to methods, systems, and machine-readable storage media for protecting user data in a Virtual Private Cloud (VPC) deployment. BACKGROUND Data platforms are widely used for data storage and data access in computing and communication contexts. With respect to architecture, a data platform could be an on-premises data platform, a network-based data platform (e.g., a cloud-based data platform), a combination of the two, or include another type of architecture. Processes that are associated with a user account may, via one or more types of clients, be able to cause data to be ingested into the database and may also be able to manipulate the data, add additional data, remove data, run queries against the data, generate views of the data, and so forth. However, a malicious app (application) producer can develop malicious software that can be distributed into the data platform to the accounts of app consumers. The malicious app aims to copy the user's data to the account of the app producer, such as by using the cloud storage of the app producer to save the user's data. Thus, the execution of third-party malicious code within a client's environment poses a risk, as it may facilitate the unauthorized transfer of data through internal stages, which are abstractions built on blob storage systems. BRIEF DESCRIPTION OF THE DRAWINGS Various appended drawings illustrate examples of the present disclosure and cannot be considered limiting its scope. FIG. 1 illustrates a computing environment that includes a cloud data platform, according to some examples. FIG. 2 is a block diagram illustrating components of a compute service manager of the cloud data platform, according to some examples. FIG. 3 is a block diagram illustrating components of the cloud data platform for securing data stages, according to some examples. FIG. 4 is a flowchart of a method for setting up worker nodes for secure access, according to some examples. FIG. 5 is a flowchart of a method for secure stage access, according to some examples. FIG. 6 is a flowchart of a method for protecting data in a VPC implementation, according to some examples. FIG. 7 is a block diagram illustrating an example of a machine upon or by which one or more example process examples described herein may be implemented or controlled. DETAILED DESCRIPTION Reference will now be made in detail to specific examples for carrying out the inventive subject matter. Examples are illustrated in the accompanying drawings, and specific details are set forth in the following description to provide a thorough understanding of the subject matter. It will be understood that these examples are not intended to limit the scope of the claims to the illustrated techniques. On the contrary, they are intended to cover such alternatives, modifications, and equivalents as may be included within the scope of the disclosure. A stage is a location where data files are stored. In some implementations, three types of stages are implemented: internal, external, and temporary. Internal Stages are managed by the cloud data platform; external stages are kept in cloud storage services (e.g., Amazon S3™, Azure Blob Storage, Google Cloud Storage); and temporary states are internal stages available for the duration of a session and are automatically removed when the session is finished. Although some examples are presented below with reference to a particular cloud provider, the same principles may be utilized with any of the cloud providers. Therefore, the solutions described for a given cloud provider should not be interpreted to be exclusive or limiting but rather illustrative. In some implementations, internal stages of user accounts are located in the same cloud storage (e.g., S3 bucket). When accessing their internal stage, a user is given a temporary session token scoped down to their specific stage's path suffix. However, with the added feature of enabling app sharing within the data platform, several potential security vulnerabilities arise when external apps execute in the data platform. In one attack paradigm, a malicious app producer executes in a user space (e.g., a user working pod). The app producer is the user who creates the app, and the app consumer is the user who executes the app in their own user space (e.g., user account). The attack includes embedding, by the executing app, a “Troy” session token obtained elsewhere, with write access to storage controlled by the producer (e.g., the producer's internal stage). When the consumer runs the app in their legitimate cluster, the malicious code can use the illicit token to exfiltrate the consumer's data to the producer's internal stage. Techniques are presented below to stop malicious apps from stealing user data. The solution is a multi-line defense system with three layers of defense: 1. A unique block-storage hostname per account is used for the block-storage service used (e.g.,