US-20260127224-A1 - CONSTRUCTION OF POINT-IN-TIME GRAPH FOR ANALYTICS QUERIES

US20260127224A1US 20260127224 A1US20260127224 A1US 20260127224A1US-20260127224-A1

Abstract

A method for constructing a point-in-time global consistent graph is described. The method includes accessing a stream of data from a transactional graph database, receiving, at a scalable persistent computational platform that is separate from the transactional graph database, a graph analytics query that indicates a time attribute value, in response to receiving the graph analytics query, constructing, at the scalable persistent computational platform, a point-in-time graph snapshot based on the stream of data and the time attribute value, and processing, at the scalable persistent computational platform, the graph analytics query with the point-in-time graph snapshot.

Inventors

HongJiang Zhang
Xubin Chen
Jun Li

Assignees

EBAY INC.

Dates

Publication Date: 20260507
Application Date: 20241104

Claims (20)

1 . A computer-implemented method comprising: accessing a stream of data from a transactional graph database; receiving, at a scalable persistent computational platform that is separate from the transactional graph database, a graph analytics query that indicates a time attribute value; in response to receiving the graph analytics query, constructing, at the scalable persistent computational platform, a point-in-time graph snapshot based on the stream of data and the time attribute value by reconstructing the point-in-time graph snapshot exclusively from database snapshots and database mutation logs already stored at the scalable persistent computational platform, without making any further requests to the transactional graph database after receiving the graph analytics query; and processing, at the scalable persistent computational platform, the graph analytics query with the reconstructed point-in-time graph snapshot.
2 . The computer-implemented method of claim 1 , wherein the stream of data comprises a plurality of database snapshots of the transactional graph database and a stream of database mutation logs of the transactional graph database.
3 . The computer-implemented method of claim 2 , further comprising: storing the plurality of database snapshots and the stream of database mutation logs at the scalable persistent computational platform.
4 . The computer-implemented method of claim 2 , wherein constructing the point-in-time graph snapshot further comprises: constructing the point-in-time graph snapshot corresponding to the time attribute value based on one or more database snapshots of the plurality of database snapshots, the one or more database snapshots corresponding to the time attribute value, and one or more database mutation logs from the stream of database mutation logs, the one or more database mutation logs corresponding to the time attribute value.
5 . The computer-implemented method of claim 4 , wherein the one or more database mutation logs comprise a combination of unmerged database mutation logs and merged database mutation logs corresponding to the time attribute value.
6 . The computer-implemented method of claim 2 , wherein constructing the point-in-time graph snapshot further comprises: constructing, at the scalable persistent computational platform, the point-in-time graph snapshot corresponding to the time attribute value based on the constructed point-in-time database snapshot corresponding to the time attribute value.
7 . The computer-implemented method of claim 2 , wherein constructing the point-in-time graph snapshot further comprises: identifying a time window comprising a last database snapshot prior to the time attribute value and one or more database mutation logs between the last database snapshot and an unmerged database mutation log prior to the time attribute value.
8 . The computer-implemented method of claim 7 , further comprising: merging the one or more database mutation logs identified in the time window.
9 . The computer-implemented method of claim 1 , wherein the graph analytics query is processed at the scalable persistent computational platform, and a graph transactional query is processed at the transactional graph database.
10 . The computer-implemented method of claim 1 , wherein the graph analytics query and a graph transactional query are processed on separate systems.
11 . A computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to: access a stream of data from a transactional graph database; receive, at a scalable persistent computational platform that is separate from the transactional graph database, a graph analytics query that indicates a time attribute value; in response to receiving the graph analytics query, constructing, at the scalable persistent computational platform, a point-in-time graph snapshot based on the stream of data and the time attribute value by reconstructing the point-in-time graph snapshot exclusively from database snapshots and database mutation logs already stored at the scalable persistent computational platform, without making any further requests to the transactional graph database after receiving the graph analytics query; and process, at the scalable persistent computational platform, the graph analytics query with the reconstructed point-in-time graph snapshot.
12 . The computing apparatus of claim 11 , wherein the stream of data comprises a plurality of database snapshots of the transactional graph database and a stream of database mutation logs of the transactional graph database.
13 . The computing apparatus of claim 12 , wherein the instructions further configure the apparatus to: store the plurality of database snapshots and the stream of database mutation logs at the scalable persistent computational platform.
14 . The computing apparatus of claim 12 , wherein constructing the point-in-time graph snapshot further comprises: construct the point-in-time graph snapshot corresponding to the time attribute value based on one or more database snapshots of the plurality of database snapshots, the one or more database snapshots corresponding to the time attribute value, and one or more database mutation logs from the stream of database mutation logs, the one or more database mutation logs corresponding to the time attribute value.
15 . The computing apparatus of claim 14 , wherein the one or more database mutation logs comprise a combination of unmerged database mutation logs and merged database mutation logs corresponding to the time attribute value.
16 . The computing apparatus of claim 12 , wherein constructing the point-in-time graph snapshot further comprises: construct, at the scalable persistent computational platform, the point-in-time graph snapshot corresponding to the time attribute value based on the constructed point-in-time database snapshot corresponding to the time attribute value.
17 . The computing apparatus of claim 12 , wherein constructing the point-in-time graph snapshot further comprises: identify a time window comprising a last database snapshot prior to the time attribute value and one or more database mutation logs between the last database snapshot and an unmerged database mutation log prior to the time attribute value.
18 . The computing apparatus of claim 17 , wherein the instructions further configure the apparatus to: merge the one or more database mutation logs identified in the time window.
19 . The computing apparatus of claim 11 , wherein the graph analytics query is processed at the scalable persistent computational platform, and a graph transactional query is processed at the transactional graph database.
20 . A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to: access a stream of data from a transactional graph database; receive, at a scalable persistent computational platform that is separate from the transactional graph database, a graph analytics query that indicates a time attribute value; in response to receiving the graph analytics query, constructing, at the scalable persistent computational platform, a point-in-time graph snapshot based on the stream of data and the time attribute value by reconstructing the point-in-time graph snapshot exclusively from database snapshots and database mutation logs already stored at the scalable persistent computational platform, without making any further requests to the transactional graph database after receiving the graph analytics query; and process, at the scalable persistent computational platform, the graph analytics query with the reconstructed point-in-time graph snapshot.

Description

TECHNICAL FIELD The subject matter disclosed herein generally relates to the field of graph database management systems. Specifically, it addresses techniques for constructing and querying point-in-time graph snapshots for scalable graph analytics while maintaining high-performance transactional operations on graph databases. BACKGROUND Graph databases have become increasingly important for managing complex, interconnected data across various domains such as fraud detection, social networks, and recommendation systems. These databases excel at representing and querying relationships between entities, making them valuable for applications that require deep analysis of interconnected data. As the volume and complexity of data continue to grow, graph databases have evolved to handle both transactional and analytical workloads. One type of graph analytics query includes a point-in-time graph query that allows users to retrieve and analyze the state of a graph database at a specific historical moment. This type of query enables users to examine the graph structure, relationships, and properties as they existed at a particular point in the past, rather than just the current state of the graph. Point-in-time graph queries are useful for applications such as fraud detection, root cause analysis, and understanding the evolution of complex systems over time. However, managing historical data and providing efficient point-in-time querying capabilities while maintaining high-performance real-time operations has emerged as a significant challenge in the field of graph database management systems. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced. FIG. 1 is a diagrammatic representation of a networked environment in which the present disclosure may be deployed, in accordance with some example embodiments. FIG. 2 is a block diagram illustrating a point-in-time global graph application that, in one example embodiment, is provided as part of a networked system. FIG. 3 is a block diagram illustrating an architecture of the subject matter in accordance with one example embodiment. FIG. 4 is a diagram illustrating a timeline in accordance with one example embodiment. FIG. 5 is a block diagram illustrating a sequence in accordance with one example embodiment. FIG. 6 illustrates a routine 600 in accordance with one example embodiment. FIG. 7 is a diagrammatic representation of a machine in the form of a computer system within which a set of instructions may be executed for causing the machine to perform any one or more of the methodologies discussed herein, according to an example embodiment. DETAILED DESCRIPTION The description that follows describes systems, methods, techniques, instruction sequences, and computing machine program products that illustrate example embodiments of the present subject matter. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the present subject matter. It will be evident, however, to those skilled in the art, that embodiments of the present subject matter may be practiced without some or other of these specific details. Examples merely typify possible variations. Unless explicitly stated otherwise, structures (e.g., structural components, such as modules) are optional and may be combined or subdivided, and operations (e.g., in a procedure, algorithm, or other function) may vary in sequence or be combined or subdivided. The terms “merged mutation logs” and “unmerged mutation logs” refer to two types of logs used in a point-in-time graph query system. In one example, the term “merged mutation logs” refers to mutation logs that have been periodically combined or consolidated. For example, they are created at regular intervals (e.g., every 10 minutes) to optimize storage and processing efficiency. Merged logs represent accumulated changes in the database over a specific time window. In one example, the term “unmerged mutation logs” refers to raw, individual mutation logs that have not yet been consolidated. They represent the most recent changes to the database that have occurred since the last merging process. Unmerged logs provide the most up-to-date information for constructing point-in-time snapshots. Graph databases have become increasingly important for managing complex, interconnected data in various applications such as fraud detection, social networks, and recommendation systems. As data stored in graph databases continues to change over time, there is a growing need for querying historical states of the graph for analytics purposes. Traditional graph databases primarily focus on serving real-time transactional queries, which require fast response times, typically less than 100 millise