JP-7857348-B2 - Change-aware snapshot replication

JP7857348B2JP 7857348 B2JP7857348 B2JP 7857348B2JP-7857348-B2

Inventors

ユ・ガオ
ジュンペン・リウ
ジフェン・シュ
ヒョン・ソグ・キム
ウォン・ウク・ホン
ジ・フン・ジャン

Assignees

エスアーペーエスエー

Dates

Publication Date: 20260512
Application Date: 20240702
Priority Date: 20230727

Claims (17)

A computing system, At least one memory, One or more hardware processor units coupled to at least one memory, The system includes one or more computer-readable storage media that store computer-executable instructions causing the computing system to perform an operation when executed, and the operation is In the local data source, create the first instance of the first data object type, Receiving a first set of data for a first instance of the first data object type from a remote data source, wherein the first set of data is extracted from the remote data object of the remote data source, The first instance of the first data object type stores the first set of data, Receiving queries and, Determining that the query accesses the first instance of the first data object type, Determining that the first instance of the first data object type is old if it differs from the remote data object, and that determining that the first instance of the first data object type is old begins at least in part on determining that the query accesses the first instance of the first data object type , If it is determined that the first instance of the first data object type is old, Receiving a second set of data for the first instance of the first data object type, wherein the second set of data is updated data of the remote data object, A computing system comprising replacing at least a portion of the first set of data with data from a second set of data within the first instance of the first data object type.
The aforementioned operation, The computing system according to claim 1 , further comprising postponing the execution of the query until the replacement is completed.
The computing system according to claim 2 , wherein the postponement is performed in response to a command in the query.
The aforementioned operation, The computing system according to claim 1 , further comprising executing the query before the replacement.
The computing system according to claim 4 , wherein the execution of the query before the replacement is performed in response to a command in the query.
The aforementioned operation, The computing system according to claim 1 , further comprising executing at least a portion of the query that accesses the first instance of the first data object type against the remote data object.
The computing system according to claim 6 , wherein executing at least a portion of the query that accesses the first instance of the first data object type against the remote data object is performed in response to a command in the query.
The computing system according to claim 1, wherein determining that the first instance of the first data object type is old includes comparing a digest value generated from the content of the first set of data with a digest value generated from the content of the remote data object and obtained from the current metadata of the remote data object.
The computing system according to claim 1, wherein determining that the first instance of the first data object type is old includes comparing statistics maintained with respect to the first instance of the first data object type with statistics obtained from the current metadata of the remote data object.
The computing system according to claim 1, wherein determining that the first instance of the first data object type is old includes comparing the last modified date maintained with respect to the first instance of the first data object type with the last modified date obtained from the current metadata of the remote data object.
The computing system according to claim 1, wherein determining that the first instance of the first data object type is old includes comparing the size maintained with respect to the first instance of the first data object type with the size obtained from the current metadata of the remote data object.
The computing system according to claim 1, wherein the first data object type includes an identifier indicating whether a query accessing an instance of the first data object type should be executed against the local data source using the instance of the first data object type, or against the remote data object.
The computing system according to claim 1, wherein the remote data object and the first data object type share a common set of semantic attributes.
The computing system according to claim 1, wherein the remote data object and the first data object type are relational database tables.
The computing system according to claim 1, wherein the determination is performed on the remote data source.
A method for performing an operation in a computing system comprising at least one hardware processor and at least one memory coupled to the at least one hardware processor, The steps include creating a first instance of a first data object type in a local data source, A step of receiving a first set of data for a first instance of the first data object type from a remote data source, wherein the first set of data is extracted from the remote data object of the remote data source. The steps of storing the first set of data in the first instance of the first data object type, Steps to receive queries, The steps include determining that the query accesses the first instance of the first data object type, A step of determining that the first instance of the first data object type is old if the first instance of the first data object type is different from the remote data object, the step of determining that the first instance of the first data object type is old begins at least in part on the determination that the query accesses the first instance of the first data object type , If it is determined that the first instance of the first data object type is old, A step of receiving a second set of data for a first instance of the first data object type, wherein the second set of data is updated data of the remote data object, A method comprising the steps of replacing at least a portion of the first set of data with data from a second set of data within the first instance of the first data object type.
A computer executable instruction, when executed by a computing system including at least one hardware processor and at least one memory coupled to the at least one hardware processor, causes the computing system to create a first instance of a first data object type in a local data source. A computer executable instruction, when executed by the computing system, causes the computing system to receive a first set of data for a first instance of the first data object type from a remote data source, wherein the first set of data is extracted from a remote data object of the remote data source. A computer executable instruction, when executed by the computing system, causes the computing system to store the first set of data in the first instance of the first data object type, A computer executable instruction that receives a query, A computer executable instruction that causes the query to determine that it accesses the first instance of the first data object type, A computer-executable instruction, when executed by the computing system, causes the computing system to determine that the first instance of the first data object type is old if it differs from the remote data object, wherein the computer-executable instruction causing the first instance of the first data object type to be old is initiated at least in part on the determination that the query accesses the first instance of the first data object type , If it is determined that the first instance of the first data object type is old, A computer executable instruction, when executed by the computing system, causes the computing system to receive a second set of data for the first instance of the first data object type, wherein the second set of data is updated data of the remote data object; One or more non-temporary computer-readable storage media, each comprising a computer-executable instruction that, when executed by the computing system, causes the computing system to replace at least a portion of the first set of data within the first instance of the first data object type with data from the second set of data.

Description

This disclosure generally relates to data replication. A specific implementation relates to determining whether a source data object has changed compared to a target data object before replicating data from the source data object to the target data object. It is becoming increasingly common for companies to store data across various systems, including one or more local systems and one or more cloud-based systems. These systems can be of different types, such as storing data in different formats (e.g., relational databases versus databases storing JSON documents) or using different database management systems (e.g., using software and/or hardware from different vendors). Even if the data is stored in the same format using software from the same vendor, there can still be differences in where the data is stored and the schema used to store that data. To help address these issues, various techniques have been used, including data replication and database federation. In a federated database environment, requests for database operations, such as queries, can specify a source in either a local database system or a "remote" database accessed using data federation. In some cases, both local and remote data sources may be specified in the same query, such as a query retrieving data from both a local database table and a data source in a remote federated database system. Data replication provides strategic business advantages by enabling the target system to access and utilize data originally present in the source system, supplementing information already natively stored in the target system itself. In this architecture, the source system essentially acts as a more trusted, authoritative, or primary source for the replicated data. This also means that the target system can leverage the additional information replicated from the source system to enhance its data processing or analysis capabilities, while maintaining its own unique data. However, any discrepancies or conflicts involving the replicated data are typically resolved prioritizing the source system, considering its status as a more trusted and superior source. For various reasons, such as query execution, data replication can be a better solution than data federation in some scenarios. However, in many cases, data in the source system is not "static" because data can be added to, deleted from, or modified within the source system. Therefore, instead of replicating data from the source system to the target system once, a process is implemented to update the data in the target system based on changes made in the source system. One way to perform these updates is to enable real-time replication, which often uses log-based techniques to ensure that changes made in the source system are propagated to the target system as soon as they occur in the source system. While this can help ensure that the data in the target system is identical to the data in the source system, the real-time replication process can be complex and computationally expensive on both the source and target systems. A particular drawback of log-based techniques is that both the source and target systems typically need to be tightly integrated. For example, the target system is usually designed based on the source system's implementation, and therefore, appropriate logs can be obtained and parsed. Often, this makes implementing real-time replication across systems from different vendors difficult. Therefore, another type of replication that can be used is snapshot replication, in which all or selected data sources, such as specific tables identified for replication, periodically send their contents to the target system so that the target system can have a more up-to-date version of the data in the source system. However, since snapshot replication is not real-time replication, the data in the target system can become outdated compared to the data in the source system over time. Furthermore, since all the content identified for replication in the source system is usually sent to the target system during a snapshot update, performing a snapshot update can be computationally expensive. Therefore, there is room for improvement. This figure shows an exemplary database system that may be used to implement the disclosed aspects of technology.This diagram illustrates a computing environment in which a database system can access data on a remote computing system using data federation, including virtual tables, or using local tables that have replicated data from the remote computing system.This diagram shows conventional data replication technology.These are diagrams and flowcharts of the snapshot replication process as disclosed herein.This is a flowchart of the disclosed process that executes queries containing hints about preferred behavior when a snapshot replica is determined to be outdated.This diagram shows remote tables and replicated tables, as well as data elements that can be used to determine