CN-121996691-A - Graph database query acceleration method and system

CN121996691ACN 121996691 ACN121996691 ACN 121996691ACN-121996691-A

Abstract

The invention discloses a graph database query acceleration method and a graph database query acceleration system. The method comprises the steps of constructing panoramic query observation data through non-invasive log interception and structuring, analyzing query sentences into abstract syntax trees, carrying out semantic normalization processing to generate unique hash values, predicting hot spot query templates and high-frequency parameters thereof based on a hot value score model, asynchronously preloading results to a cache, dynamically calculating self-adaptive survival time for each cache item according to data change frequency and access heat, constructing a multi-level cache architecture comprising a local cache, a distributed cache and a database built-in cache, and carrying out intelligent routing and collaborative backfilling based on the hash values. The invention realizes intelligent and transparent acceleration of inquiring the graph database, remarkably improves the inquiring performance, reduces the load of the database, and solves the problems of redundancy, high cold start delay and difficult balance of cache consistency caused by grammar difference of the traditional cache.

Inventors

SONG YAO
Wei Chuanqiang
Bai Enyin
LIU PENG
CHEN QIAO

Assignees

山东齐鲁壹点传媒有限公司
山东数字文化集团有限公司

Dates

Publication Date: 20260508
Application Date: 20260119

Claims (10)

1. A graph database query acceleration method, comprising: S1, collecting log data queried by a graph database and processing the log data into a structured query log data set; S2, analyzing and semantically normalizing the query sentences in the structured dataset to generate a parameterized query mode, calculating a unique semantic identifier of the parameterized query mode, and constructing and maintaining a knowledge base; S3, identifying a query template and parameter combination meeting preset conditions based on historical execution information in the knowledge base, and actively loading a query result to a cache system; s4, dynamically adjusting a retention strategy of the cache item in the cache system according to the graph data updating condition associated with the cache item and the access condition of the cache item; S5, configuring a cache system with at least two layers of structures, and defining a routing rule of a query request in the cache system and a data synchronization mechanism among different cache layers.
2. The method according to claim 1, wherein the step S2 comprises: analyzing the query statement into an abstract syntax tree; Performing at least one semantic preserving transformation operation on the abstract syntax tree, the transformation operation selected from the group consisting of identifier format normalization, logical exchange law based condition reordering, user variable name normalization to system variable name, and constant substitution to parameter placeholder; generating a parameterized query template based on the transformed abstract syntax tree; And calculating the parameterized query template to generate a unique identifier thereof.
3. The method of claim 1, wherein said building and maintaining a knowledge base in step 2 comprises: Recording statistical information of the query template in the knowledge base by taking the unique identifier as a primary key, wherein the statistical information comprises accumulated execution times, average execution time consumption, historical execution time stamp sequences, parameter value sequences used and result set sizes; And when the query template is executed again, updating the statistical information of the corresponding records in the knowledge base.
4. The method according to claim 1, wherein the step 3 comprises: according to the statistical information, calculating the heat score of each query template; For each query template, analyzing a historical parameter value sequence of the query template, and determining a parameter value of high-frequency occurrence; Combining the heat score and the frequency of occurrence of the parameter value, and calculating the comprehensive priority score of the combination of the query template and the specific parameter value; and determining the combination of which the comprehensive priority score exceeds the threshold as a target to be actively loaded.
5. The method of claim 4, wherein the formula for calculating the heat score S of the query template is S=FxW (T) xTx (1/R), Wherein F is the execution times in a preset time window, W (T) is the time decay weight based on the latest access time, T is the average execution time consumption, and R is the average result set size.
6. The method of claim 4 or 5, wherein the formula for calculating the composite priority score P is P=S×R_balance×P_freq, Wherein R_penalty is a ranking penalty coefficient calculated based on the ranking of the query template in the hotness-ordered list, and P_freq is the frequency proportion of occurrence of the high-frequency parameter value in the history parameters of the query template.
7. The method according to claim 1, wherein the dynamically adjusting the retention policy of the cache entry in step S4 comprises: monitoring a data change event of a graph database, and counting the change times of data related to the cache item in unit time to obtain a data change frequency factor Vf; counting the accessed times of the cache item in unit time to obtain an access heat factor Hf; According to the formula retention time duration ttl=t_base× (1-Vf) × (1+hf), a dynamic time-to-live is calculated for the cache entry, where t_base is the base time-to-live value.
8. The method according to claim 1, wherein the cache system in step S5 comprises at least: the first cache layer is a memory cache deployed locally to the application program; The second cache layer is a distributed shared cache independent of the application program; The routing rule is that a query request firstly accesses the first cache layer, accesses the second cache layer if the query request misses, and queries a graph database if the query request still misses; The data synchronization mechanism is used for asynchronously writing the results into the first cache layer when the query results are hit in the second cache layer, and synchronously writing the results into the second cache layer and the first cache layer corresponding to the application program initiating the query when the query results are acquired from the graph database.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the graph database query acceleration method of any one of claims 1 to 8 when the program is executed by the processor.
10. A graph database query acceleration system for implementing the method of any one of claims 1 to 8, the system comprising: The log acquisition and processing module is used for acquiring and structuring a graph database query log; The semantic analysis and knowledge base management module is used for analyzing and normalizing the query sentences, generating semantic identifiers and constructing and maintaining a knowledge base recording the characteristics of the query modes; The hot spot prediction and preloading engine is used for carrying out heat modeling and priority calculation based on the knowledge base and asynchronously executing a preloading task; the dynamic cache life cycle management module is used for monitoring data change and access heat, and dynamically calculating and adjusting the survival time of the cache data; And the multi-level cache collaborative manager is used for managing an architecture comprising at least two levels of caches and executing corresponding routing and data synchronization operations.

Description

Graph database query acceleration method and system Technical Field The invention relates to the technical field of database performance optimization, in particular to a graph database query acceleration method and system. Background The graph database is widely applied to the fields of social networks, knowledge maps, recommendation systems, financial wind control and the like due to the natural advantages of the graph database in terms of expressing and processing complex association relations. As data volume and query complexity grow, graph databases face significant performance challenges, especially the low latency requirements of high concurrency. Traditional query optimization techniques, such as indexing, optimizing query plans, etc., have been applied at the graph database level. However, at the application layer, common caching strategies (such as caching based on original query strings) have obvious defects that firstly, the graph query languages (such as nGQL and Cypher) are flexible, and queries with the same semantics can have the differences of case, space, variant aliases, conditional sequences and the like in the grammar expression form, so that the traditional caches cannot recognize the semantic identity of the queries, and the cache redundancy and the hit rate are low. Secondly, the cache has the problem of cold start, and the first query request still needs to access the bottom database, so that the requirement of an extremely low delay scene can not be met. Again, static cache expiration policies have difficulty balancing data freshness versus cache utilization, with frequent data being subject to dirty reads, and with stable data being subject to premature failure. Finally, it is difficult for a single level cache to meet the requirements of extreme speed, large capacity, and global coherency at the same time. Therefore, a systematic solution capable of understanding query semantics, actively predicting hotspots, intelligently managing lifecycles, and coordinating with multi-level cache resources is needed to achieve deep, transparent acceleration of graph database queries. Disclosure of Invention In order to solve the problems, the invention provides a graph database query acceleration method and a graph database query acceleration system. According to the method, the cache redundancy problem is solved through semantic normalization, the cold start problem is solved through predictive preloading, the balance performance and consistency are managed through the self-adaptive life cycle, and the system efficiency is maximized through multistage cache cooperation. In order to achieve the above purpose, the method for accelerating the query of the graph database based on the cooperation of semantic perception and multi-level caching is provided, and the specific technical scheme is as follows: S1, collecting log data queried by a graph database and processing the log data into a structured query log data set; S2, analyzing and semantically normalizing the query sentences in the structured dataset to generate a parameterized query mode, calculating a unique semantic identifier of the parameterized query mode, and constructing and maintaining a knowledge base; S3, identifying a query template and parameter combination meeting preset conditions based on historical execution information in the knowledge base, and actively loading a query result to a cache system; s4, dynamically adjusting a retention strategy of the cache item in the cache system according to the graph data updating condition associated with the cache item and the access condition of the cache item; S5, configuring a cache system with at least two layers of structures, and defining a routing rule of a query request in the cache system and a data synchronization mechanism among different cache layers. Preferably, the step S2 includes: analyzing the query statement into an abstract syntax tree; Performing at least one semantic preserving transformation operation on the abstract syntax tree, the transformation operation selected from the group consisting of identifier format normalization, logical exchange law based condition reordering, user variable name normalization to system variable name, and constant substitution to parameter placeholder; generating a parameterized query template based on the transformed abstract syntax tree; And calculating the parameterized query template to generate a unique identifier thereof. Preferably, the constructing and maintaining a knowledge base in step 2 includes: Recording statistical information of the query template in the knowledge base by taking the unique identifier as a primary key, wherein the statistical information comprises accumulated execution times, average execution time consumption, historical execution time stamp sequences, parameter value sequences used and result set sizes; And when the query template is executed again, updating the statistical information of the correspon