US-12619742-B2 - Reflected distributed data access
Abstract
The present disclosure is directed to systems and methods for securely reflecting data from a local datastore. In one example embodiment, a user may connect local datastores to edge nodes that may facilitate communication between a third-party data management hub and the local datastores. A user operating the third-party data management hub may request to read/write certain data at the local datastores (that the user owns). Data from the local datastore may be generated in response to the request. The responsive data may be transmitted over an encrypted data session to the data management hub, where the responsive data is reflected to the user on a user interface. The responsive data is stored in a volatile memory store and not moved from the persistent memory at the local datastore. Upon termination of the encrypted data session, the volatile memory store is erased to preserve the privacy/security of the underlying data.
Inventors
- Vadim Vaks
- Christopher Channing
Assignees
- COLLIBRA BELGIUM BV
Dates
- Publication Date
- 20260505
- Application Date
- 20220610
Claims (19)
- 1 . A system for reflecting data from at least one datastore, comprising: a memory configured to store non-transitory computer readable instructions; and a processor communicatively coupled to the memory, wherein the processor, when executing the non-transitory computer readable instructions, is configured to: receive at least one user instruction, wherein the at least one user instruction targets the at least one datastore; transmit the at least one user instruction to a queue at an edge gateway, wherein the queue is associated with an edge node, wherein the edge node lacks a network path to the edge gateway, wherein the edge node is associated with the at least one datastore, and wherein the edge node polls the queue to receive the at least one user instruction; initiate an encrypted data session with the edge node; receive reflected data retrieved from the at least one datastore over the encrypted data session while the encrypted data session is uninterrupted, wherein the reflected data is stored only in at least one volatile memory location associated with the encrypted data session; and while the encrypted data session is in progress, display the reflected data.
- 2 . The system of claim 1 , the processor further configured to: when the encrypted data session is disconnected, erase the reflected data from the at least one volatile memory location.
- 3 . The system of claim 1 , the processor further configured to: receive at least one data-quality result from a metastore.
- 4 . The system of claim 3 , the processor further configured to: based on the at least one data-quality result from the metastore, transmit a second user instruction to the edge node.
- 5 . The system of claim 3 , wherein the at least one data-quality result is an error notice associated with the at least one datastore.
- 6 . The system of claim 1 , wherein the at least one user instruction is deserialized at the edge node prior to execution at the at least one datastore.
- 7 . The system of claim 1 , wherein the reflected data is encrypted prior to transmission over the encrypted data session.
- 8 . The system of claim 1 , wherein the at least one user instruction is at least one of: a read command and a write command.
- 9 . The system of claim 1 , the processor further configured to: generate a metastore.
- 10 . The system of claim 9 , wherein the metastore comprises at least one scoping requirement associated with underlying data stored in the at least one datastore.
- 11 . The system of claim 9 , wherein the metastore is associated with a second datastore.
- 12 . The system of claim 9 , wherein the metastore comprises at least one of: schema, user instructions, DQ analysis results, and platform data.
- 13 . The system of claim 1 , wherein the at least one user instruction is a view command to view information associated with a plurality of datastores in a unified user interface.
- 14 . The system of claim 13 , the processor further configured to: based on the view command, display platform data associated with the plurality of datastores in the unified user interface.
- 15 . A method for reflecting data from a datastore, comprising: connecting at least one edge node to the datastore; transmitting a first user instruction to process the data stored in the datastore for data quality, wherein the first user instruction is transmitted to a queue at an edge gateway, wherein the queue is associated with the at least one edge node, wherein the at least one edge node lacks a network path to the edge gateway, and wherein the at least one edge node polls the queue to receive the at least one user instruction; receiving at least one data quality result from the at least one edge node; based on the at least one data quality result received from the at least one edge node, transmitting a second user instruction to read a portion of the data stored in the datastore; initiating an encrypted data session; receiving reflected data from the at least one edge node over the encrypted data session while the encrypted data session is uninterrupted, wherein the reflected data is stored only in at least one volatile memory location associated with the encrypted data session; and displaying the reflected data on a user interface.
- 16 . The method of claim 15 , further comprising erasing the reflected data from the at least one volatile memory location upon disconnection of the encrypted data session.
- 17 . The method of claim 15 , further comprising erasing the reflected data from the at least one volatile memory location upon disconnection of the user interface.
- 18 . A non-transitory computer-readable media storing computer executable instructions that when executed cause a computing system to perform the steps for reflecting data from a datastore, comprising: connecting at least one edge node to the datastore; transmitting a first user instruction to process the data stored in the datastore for data quality, wherein the first user instruction is transmitted to a queue at an edge gateway, wherein the queue is associated with the at least one edge node, wherein the at least one edge node lacks a network path to the edge gateway, and wherein the at least one edge node polls the queue to receive the at least one user instruction; receiving at least one data quality result from the at least one edge node; based on the at least one data quality result received from the at least one edge node, transmitting a second user instruction to read a portion of the data stored in the datastore; initiating an encrypted data session; receiving reflected data from the at least one edge node over the encrypted data session while the encrypted data session is uninterrupted, wherein the reflected data is stored only in at least one volatile memory location associated with the encrypted data session; and displaying the reflected data on a user interface.
- 19 . The non-transitory computer-readable medium of claim 18 , wherein the at least one data quality result from the at least one edge node is an error notice.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S) U.S. patent application Ser. No. 16/776,293 entitled “SYSTEMS AND METHOD OF CONTEXTUAL DATA MASKING FOR PRIVATE AND SECURE DATA LINKAGE”; U.S. patent application Ser. No. 17/103,751, entitled “SYSTEMS AND METHODS FOR UNIVERSAL REFERENCE SOURCE CREATION AND ACCURATE SECURE MATCHING”; U.S. patent application Ser. No. 17/103,720, entitled “SYSTEMS AND METHODS FOR DATA ENRICHMENT”; and U.S. patent application Ser. No. 17/219,340, entitled “SYSTEMS AND METHODS FOR AN ON-DEMAND, SECURE, AND PREDICTIVE VALUE-ADDED DATA MARKETPLACE,” are hereby incorporated by reference in their entirety. TECHNICAL FIELD The present disclosure relates to distributed data and privacy by design. BACKGROUND Data profiling and data quality (DQ) are essential for entities with numerous large datasets and data sources. However, at scale, data profiling and ensuring data quality across such large amounts of data can be cost-prohibitive and utilize scarce computing resources. Data profiling and quality analyses can also take extended period of times when carried out by external or non-native applications. Data profiling and quality checks are often serialized and async from the process of data creation. In one example, an entity wishing to profile or clean up certain data may discover that certain sensitive data is stored locally rather than on a server accessible via a network. To profile and assess the quality of this local data, the entity would be required to setup an on-premise program to analyze the data (which takes a long time to setup and is expensive from a hardware and maintenance perspective) or send the sensitive local data up to a server to be accessible on the network (which then increases the risk of a privacy breach, as the data is no longer on a local and isolated computer). In some examples, certain sensitive data, such as personally identifiable information, may be required to be stored only on local storage devices and have no storage presence on a third-party/remote server (i.e., persistence at the disk level). To get around these requirements, a client would need to create a bypass in a firewall or rely on heavy process/approval methods like establishing a virtual private connection (e.g., virtual private link), which may trigger certain regulatory and compliance issues, deepening on the nature of the sensitive data. As such, a need exists to access local data (e.g., for profiling, quality assurance, analysis, etc.) in a quick, inexpensive, and secure method without comprising the underlying privacy of sensitive data. In another example, an entity wishing to perform DQ analyses on certain sets of data may discover that the entity's data sources are fragmented across numerous different datastores. To perform DQ analysis on each of these different datastores, the entity would need to perform an individual analysis on each of those datastores, since they are disconnected and siloed off from the other datastores. As such, a need also exists to aggregate data from different datastores in a secure manner so that DQ analyses may be performed efficiently across multiple different datastores. It is with respect to these and other general considerations that the aspects disclosed herein have been made. Also, although relatively specific problems may be discussed, it should be understood that the examples should not be limited to solving the specific problems identified in the background or elsewhere in the disclosure. BRIEF DESCRIPTION OF THE DRAWINGS Non-limiting and non-exhaustive examples are described with reference to the following figures. FIG. 1 illustrates an example of a distributed system for securely reflecting data from a datastore, as described herein. FIG. 2 illustrates an example input processor for implementing systems and methods for securely reflecting data from a datastore. FIG. 3 illustrates an example method for securely reflecting data from a datastore, as described herein. FIG. 4 illustrates an example distributed environment for securely reflecting data from a datastore. FIG. 5 illustrates one example of a suitable operating environment in which one or more of the present embodiments may be implemented. DETAILED DESCRIPTION Various aspects of the disclosure are described more fully below with reference to the accompanying drawings, which form a part hereof, and which show specific exemplary aspects. However, different aspects of the disclosure may be implemented in many different forms and should not be construed as limited to the aspects set forth herein; rather, these aspects are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the aspects to those skilled in the art. Aspects may be practiced as methods, systems, or devices. Accordingly, aspects may take the form of a hardware implementation, an entirely software implementation or an implementation combining software and hardware aspects. The following detailed