US-12621298-B2 - Secure hybrid data transfer through connection and request direction decoupling
Abstract
Systems and methods are directed to secure hybrid data transfer through connection and request direction decoupling. Initially, a controller server in a source on-premises cluster establishes a persistent connection to a controller server in a destination cloud cluster. The connection is then reversed, which enables destination servers to request additional connections between any pair of source and destination servers dynamically from the source cluster. Each of these connections is established by a source server, which authenticates to a cloud (destination) server and then requests to reverse the connection. The reversed connections can be used to transfer data securely between the on-premises and cloud servers. The source server can be a broker in a source cluster located on-premises and the destination server can be a broker in a destination cluster in the cloud.
Inventors
- Rajini Sivaram
- Nikhil Bhatia
Assignees
- Confluent, Inc.
Dates
- Publication Date
- 20260505
- Application Date
- 20221102
Claims (20)
- 1 . A method comprising: receiving, from a source controller by a source server at a source of a distributed streaming platform, a request to initiate a reverse connection to a destination server at a destination of the distributed streaming platform; in response to receiving the request, establishing, by the source server, a connection to the destination server, the connection being used to transfer data from the source to the destination; and reversing the connection, the reversing comprising: transmitting, by the source server, a reverse connection request to the destination server after establishing the connection; responsive to the reverse connection request, causing the destination server to remove the connection from a network server at the destination that accepts connections on one or more listeners and allocates each connection to a processor from its pool of processors, with a state of each connection stored in a channel managed by a selector associated with an assigned processor of the network server and to add the connection to a network client at the destination that establishes connections and processes traffic to and from brokers, with a state of each connection stored in a channel managed by a selector of the network client; receiving, by the source server, a response to the reverse connection request from the destination server; and responsive to receiving the response to the reverse connection request, removing, by the source server, the connection from a network client at the source that establishes connections and processes traffic to and from brokers, with a state of each connection stored in a channel managed by a selector of the network client at the source and adding the connection to a network server at the source that accepts connections on one or more listeners and allocates each connection to a processor from its pool of processors, with a state of each connection stored in a channel managed by a selector associated with an assigned processor of the network server at the source, the reversing enabling a client in the destination to send requests to the source server for the data on the reversed connection.
- 2 . The method of claim 1 , wherein: the source server comprises a source broker in a source cluster on-premises; the destination server comprises a destination broker in a destination cluster in a cloud; and the establishing the connection and triggering the reversing is performed by the source broker.
- 3 . The method of claim 2 , further comprising: authenticating the source broker to the destination broker using credentials of security mechanisms supported in the destination cluster, wherein the reverse connection request to reverse the established connection is authorized by the destination server based on an authenticated destination service identity associated with the connection or the reverse connection request from the source server.
- 4 . The method of claim 1 , further comprising: after establishing the connection and before reversing the connection, negotiating an application programming interface (API) version to be used in subsequent requests.
- 5 . The method of claim 1 , wherein: the causing the destination server to remove the connection from the network server at the destination and to add the connection to the network client at the destination comprises causing the destination server to remove a server-side channel on which the reverse connection request was received from the network server at the destination and to add a client-side channel to the network client at the destination; and the removing the connection from the network client at the source and adding the connection to the network server at the source comprises removing a client-side channel that was used to send the reverse connection request from a network client at the source and adding a server-side channel to a network server at the source.
- 6 . The method of claim 1 , further comprising: establishing, by the source controller at the source, a persistent reverse connection to a destination controller at the destination on which a destination client can request a connection from any source broker to any destination broker.
- 7 . The method of claim 1 , further comprising: establishing a source service identity associated with the request based on connection or request credentials being authenticated by the source server using security mechanisms supported in a source cluster; and authorizing, by the source server, the request to initiate the reverse connection based on the source service identity associated with the request.
- 8 . The method of claim 7 , further comprising: after reversing the connection, associating, by the source server, the source service identity to a server-side of the reversed connection, wherein further requests from the destination are securely authorized using the source service identity.
- 9 . The method of claim 1 , further comprising: after reversing the connection, receiving, by the source server, a request for data from the client at the destination; and in response to receiving the request for data, transmitting the data from a topic at the source to a topic at the destination.
- 10 . A system comprising: one or more hardware processors; and a memory storing instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform operations comprising: receiving, from a source controller by a source server at a source of a distributed streaming platform, a request to initiate a reverse connection to a destination server at a destination of the distributed streaming platform; in response to receiving the request, establishing, by the source server, a connection to the destination server, the connection being used to transfer data from the source to the destination; and reversing the connection, the reversing comprising: transmitting, by the source server, a reverse connection request to the destination server after establishing the connection: responsive to the reverse connection request, causing the destination server to remove the connection from a network server at the destination that accepts connections on one or more listeners and allocates each connection to a processor from its pool of processors, with a state of each connection stored in a channel managed by a selector associated with an assigned processor of the network server and to add the connection to a network client at the destination that establishes connections and processes traffic to and from brokers, with a state of each connection stored in a channel managed by a selector of the network client; receiving, by the source server, a response to the reverse connection request from the destination server; and responsive to receiving the response to the reverse connection request, removing, by the source server, the connection from a network client at the source that establishes connections and processes traffic to and from brokers, with a state of each connection stored in a channel managed by a selector of the network client at the source and adding the connection to a network server at the source that accepts connections on one or more listeners and allocates each connection to a processor from its pool of processors, with a state of each connection stored in a channel managed by a selector associated with an assigned processor of the network server at the source, the reversing enabling a client in the destination to send requests to the source server for the data on the reversed connection.
- 11 . The system of claim 10 , wherein: the source server comprises a source broker in a source cluster on-premises; the destination server comprises a destination broker in a destination cluster in a cloud; and the establishing the connection and triggering the reversing is performed by the source broker.
- 12 . The system of claim 10 , wherein the operations further comprise: after establishing the connection and before reversing the connection, negotiating an application programming interface (API) version to be used in subsequent requests.
- 13 . The system of claim 10 , wherein: the causing the destination server to remove the connection from the network server at the destination and to add the connection to the network client at the destination in response to receiving the reverse connection request comprises causing the destination server to remove a server-side channel on which the reverse connection request was received from the network server at the destination and to add a client-side channel to the network client at the destination; and the removing the connection from the network client at the source and adding the connection to the network server in the source comprises removing a client-side channel that was used to send the reverse connection request from a network client at the source and adding a server-side channel to a network server at the source.
- 14 . The system of claim 10 , wherein the operations further comprise: after reversing the connection, receiving, by the source server, a request for data from the client at the destination; and in response to receiving the request for data, transmitting the data from a topic at the source to a topic at the destination.
- 15 . A machine-storage medium comprising instructions which, when executed by one or more hardware processors of a machine, cause the machine to perform operations comprising: receiving, from a source controller by a source server at a source of a distributed streaming platform, a request to initiate a reverse connection to a destination server at a destination of the distributed streaming platform; in response to receiving the request, establishing, by the source server, a connection to the destination server, the connection being used to transfer data from the source to the destination; and reversing the connection, the reversing comprising: transmitting, by the source server, a reverse connection request to the destination server after establishing the connection; responsive to the reverse connection request, causing the destination server to remove the connection from a network server at the destination that accepts connections on one or more listeners and allocates each connection to a processor from its pool of processors, with a state of each connection stored in a channel managed by a selector associated with an assigned processor of the network server and to add the connection to a network client at the destination that establishes connections and processes traffic to and from brokers, with a state of each connection stored in a channel managed by a selector of the network client; receiving, by the source server, the response to the reverse connection request from the destination server; and responsive to receiving the response to the reverse connection request, removing, by the source server, the connection from a network client at the source that establishes connections and processes traffic to and from brokers, with a state of each connection stored in a channel managed by a selector of the network client at the source and adding the connection to a network server at the source that accepts connections on one or more listeners and allocates each connection to a processor from its pool of processors, with a state of each connection stored in a channel managed by a selector associated with an assigned processor of the network server at the source, the reversing enabling a client in the destination to send requests to the source server for the data on the reversed connection.
- 16 . The machine-storage medium of claim 15 , wherein: the source server comprises a source broker in a source cluster on-premises; the destination server comprises a destination broker in a destination cluster in the cloud; and the establishing the connection and triggering the reversing is performed by the source broker.
- 17 . The machine-storage medium of claim 15 , wherein the operations further comprise: after establishing the connection and before reversing the connection, negotiating an application programming interface (API) version to be used in subsequent requests.
- 18 . The machine-storage medium of claim 15 , wherein: the causing the destination server to remove the connection from the network server at the destination and to add the connection to the network client at the destination comprises causing the destination server to remove a server-side channel on which the reverse connection request was received from the network server at the destination and to add a client-side channel to the network client at the destination; and the removing the connection from the network client at the source and adding the connection to the network server at the source comprises removing a client-side channel that was used to send the reverse connection request from a network client at the source and adding a server-side channel to a network server at the source.
- 19 . The machine-storage medium of claim 15 , wherein the operations further comprise: establishing, by the source controller at the source, a persistent reverse connection to a destination controller at the destination on which a destination client can request a connection from any source broker to any destination broker.
- 20 . The machine-storage medium of claim 15 , wherein the operations further comprise: after reversing the connection, receiving, by the source server, a request for data from the client at the destination; and in response to receiving the request for data, transmitting the data from a topic at the source to a topic at the destination.
Description
TECHNICAL FIELD The subject matter disclosed herein generally relates to data migration. Specifically, the present disclosure addresses systems and methods for secure hybrid data transfer through connection and request direction decoupling and reversal. BACKGROUND Organizations usually have clusters running both in the cloud and on-premises. These days, users want the ability to access data from both locations. As such, data needs to be moved from on-premises to the cloud. In order to flow data from on-premises clusters to clusters in the cloud, applications or brokers running in the cloud need to be able to fetch data from the on-premises clusters. This requires applications running in the cloud to establish connections to the on-premises clusters, which are typically behind a firewall that prevent any arbitrary application from connecting to internal systems. Many organizations also use corporate authentication servers (e.g., Active Directory or OAuth servers) to authenticate connections for their on-premises servers. These on-premises servers need to be accessible to the cloud for applications from the cloud to authenticate to the on-premises servers. However, security conscious organizations are very unlikely to grant network access or access to corporate authentication servers from the cloud. This makes it difficult when data needs to be migrated from the on-premises to the cloud or made available in both the on-premises and the cloud (e.g., to set up a disaster recovery cluster). In some existing systems, a third cluster in the middle reaches out to both the on-premises and the cloud cluster to consume from on-premises and then produce to the cloud. However, this process poses operational challenges and extra maintenance overhead for the third cluster, which cannot be moved to a managed cloud cluster due to the connection direction limitation. Another option is to use proxies, which also come with significant operational burden. BRIEF DESCRIPTION OF THE DRAWINGS Some embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings. FIG. 1 is a diagram illustrating a high-level distributed streaming architecture for transferring data from an on-premises source to the cloud, according to some example embodiments. FIG. 2 is a diagram illustrating a high-level distributed streaming architecture that decouples a connection from a request, according to some example embodiments. FIG. 3 is a diagram illustrating a reverse connection process flow, according to some example embodiments. FIG. 4 is a diagram illustrating a detailed reverse connection process flow, according to some example embodiments. FIG. 5 is a flowchart illustrating operations of a method for secure data transfer through connect and request direction decoupling, according to some example embodiments. FIG. 6 is a block diagram illustrating components of a machine, according to some example embodiments, able to read instructions from a machine-storage medium and perform any one or more of the methodologies discussed herein. DETAILED DESCRIPTION The description that follows describes systems, methods, techniques, instruction sequences, and computing machine program products that illustrate example embodiments of the present subject matter. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the present subject matter. It will be evident, however, to those skilled in the art, that embodiments of the present subject matter may be practiced without some or other of these specific details. Examples merely typify possible variations. Unless explicitly stated otherwise, structures (e.g., structural components, such as modules) are optional and may be combined or subdivided, and operations (e.g., in a procedure, algorithm, or other function) may vary in sequence or be combined or subdivided. In order to flow data from an on-premises cluster to the cloud, applications or brokers running in the cloud need to establish a connection to the on-premises cluster to be able to fetch the data from the on-premises cluster. The cloud comprises a network of servers that are accessible over the Internet, and the software and databases that run on those servers. Because most organizations do not want to open up their on-premises clusters to enable clients from the cloud to establish connections, example embodiments allow a source (e.g., source cluster or broker) that is on-premises to establish a connection with the cloud (e.g., cloud cluster or broker) and initiate reversal of the connection (e.g., changing a direction of request flow on the connection). Once the connection is reversed, a client on the cloud broker can send requests to the on-premises broker. This is in contrast to conventional systems whereby the entity establishing the connection is the one sending the requests. There are several advantages to r