Search

CN-121986333-A - Data query method and device for adjusting data loading to reduce query time delay

CN121986333ACN 121986333 ACN121986333 ACN 121986333ACN-121986333-A

Abstract

A data query method is provided. The data query method comprises the steps of receiving a first query request comprising a first data identifier from a client, sending back first data indicated by the first data identifier to the client, determining a retrieval strategy according to query time cost of the first data, loading time cost of the first data, a data freshness target and a query delay target, retrieving second data and a data relation associated with the first data from a data destination according to the retrieval strategy, wherein the second data and the data relation meet the data freshness target, the data relation comprises a relation between the second data and the first data, receiving a second query request comprising a second data identifier from the client, and sending the second data indicated by the second data identifier to the client.

Inventors

  • KHALID AHMED
  • PENG YUANQIU
  • GUO LEI

Assignees

  • 华为云计算技术有限公司

Dates

Publication Date
20260505
Application Date
20240621
Priority Date
20231003

Claims (19)

  1. 1. A method of querying data, comprising: Receiving a first query request from a client, the first query request including a first data identifier; transmitting first data indicated by the first data identifier back to the client; acquiring the query time cost of the first data, the loading time cost of the first data, a data freshness target and a query time delay target; Determining a retrieval strategy according to the query time cost of the first data, the loading time cost of the first data, the data freshness target and the query time delay target; retrieving second data and a data relationship associated with the first data from a data destination according to the retrieval policy, wherein the second data and the data relationship satisfy the data freshness goal, the data relationship comprising a relationship between the second data and the first data; receiving a second query request from the client, the second query request including a second data identifier; The second data indicated by the second data identifier is sent back to the client.
  2. 2. The method of claim 1, wherein retrieving the second data and the data relationship associated with the first data from the data destination according to the retrieval policy comprises: and returning the second data and the data relationship from the data destination in the case that the data destination contains the second data and the data relationship.
  3. 3. The method of claim 2, wherein returning the second data and the data relationship from the data destination if the data destination contains the second data and the data relationship comprises: and returning the second data and the data relationship satisfying the data source freshness from the data destination in a case where the second data and the data relationship included in the data destination satisfy the data source freshness.
  4. 4. The method of claim 2, wherein returning the second data and the data relationship from the data destination if the data destination contains the second data and the data relationship comprises: in a case where the second data and the data relationship contained in the data destination do not satisfy data source freshness, loading the second data and the data relationship satisfying the data source freshness from a data source to the data destination; The second data and the data relationship satisfying the data source freshness are returned from the data destination.
  5. 5. The method of claim 1, wherein retrieving the second data and the data relationship associated with the first data from the data destination according to the retrieval policy comprises: In the event that the data destination does not contain the second data and the data relationship, the second data and the data relationship are loaded from a data source to the data destination and returned from the data destination.
  6. 6. The method according to any one of claims 1 to 5, further comprising: And in the case that the query history does not comprise the second data and the data relationship, updating the query history, wherein the updated query history comprises the second data and the data relationship.
  7. 7. The method as recited in claim 6, further comprising: Updating a first data query timestamp and a second data query timestamp in the query history, wherein in the updated query history, the first data query timestamp is the time when the first data is retrieved from the data destination, and the second data query timestamp is NULL.
  8. 8. The method of any of claims 1 to 7, wherein determining a retrieval policy from the query time cost of the first data, the load time cost of the first data, the data freshness target, and the query latency target comprises: setting a sampling interval to the data freshness target, where the sampling interval is a time interval during which the second data and the data relationship are retrieved from a data source, if the query time cost of the first data and the load time cost of the first data meet the query latency target.
  9. 9. The method of any of claims 1 to 7, wherein determining a retrieval policy from the query time cost of the first data, the load time cost of the first data, the data freshness target, and the query latency target comprises: And expanding the storage capacity of the data destination under the condition that the query time cost of the first data does not meet the query time delay target.
  10. 10. The method of any of claims 1 to 7 and 9, wherein determining a retrieval policy based on the query time cost of the first data, the load time cost of the first data, the data freshness target, and the query latency target comprises: Setting a sampling interval to a value of the data freshness target minus the query time cost of the first data, where the sampling interval is a time interval during which the second data and the data relationship are retrieved from a data source, if the query time cost of the first data does not meet the query time delay target.
  11. 11. The method of any of claims 1 to 7, wherein determining a retrieval policy from the query time cost of the first data, the load time cost of the first data, the data freshness target, and the query latency target comprises: the number of loaders is expanded in a case where the query time cost of the first data and the load time cost of the first data do not meet the query latency target and the query time cost of the first data meets the query latency target.
  12. 12. The method of claim 11, wherein expanding the number of loaders comprises: At least one additional loader and the original loader are deployed on the same server or on different servers.
  13. 13. The method of any of claims 1 to 7 and 11, wherein determining a retrieval policy based on the query time cost of the first data, the load time cost of the first data, the data freshness target, and the query latency target comprises: Setting a sampling interval to a value of the data freshness target minus the loading time cost of the first data, where the sampling interval is a time interval during which the second data and the data relationship are retrieved from a data source, where the query time cost of the first data and the loading time cost of the first data do not satisfy the query latency target and the query time cost of the first data satisfies the query latency target.
  14. 14. The method according to any one of claims 1 to 13, further comprising: storing the second data and the data relationship in a cache; Sending the second data indicated by the second data identifier back to the client includes sending the second data indicated by the second data identifier from the cache back to the client.
  15. 15. The method of any one of claims 1 to 14, wherein the first data and the second data are nodes in a graph structure of data, the data relationship between the first data and the second data being edges in the graph structure of data.
  16. 16. The method according to any one of claims 1 to 15, wherein the parameters of the first data or the second data comprise at least one of: a first data identifier for indicating the first data; a second data identifier for indicating the second data; A query time cost, which is the time it takes to query the first data or the second data from the data destination; load time cost, which is the time it takes to load the first data or the second data from a data source to the data destination; querying a timestamp, which is a time when the first data or the second data was retrieved from the data destination; A load time stamp, which is the time when the first data or the second data is loaded from a data source to the data destination, or Data source freshness, which is the time interval for updating data at a data source.
  17. 17. The method of any one of claims 1 to 16, wherein the second data comprises at least one data directly or indirectly associated with the first data.
  18. 18. A computer device comprising a memory element for storing a set of computer instructions and a processor, when the set of computer instructions is executed by the processor, the processor performs steps comprising: Receiving a first query request from a client, the first query request including a first data identifier; transmitting first data indicated by the first data identifier back to the client; acquiring the query time cost of the first data, the loading time cost of the first data, a data freshness target and a query time delay target; Determining a retrieval strategy according to the query time cost of the first data, the loading time cost of the first data, the data freshness target and the query time delay target; retrieving second data and a data relationship associated with the first data from a data destination according to the retrieval policy, wherein the second data and the data relationship satisfy the data freshness goal, the data relationship comprising a relationship between the second data and the first data; receiving a second query request from the client, the second query request including a second data identifier; The second data indicated by the second data identifier is sent back to the client.
  19. 19. A non-transitory computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to perform steps comprising: Receiving a first query request from a client, the first query request including a first data identifier; transmitting first data indicated by the first data identifier back to the client; acquiring the query time cost of the first data, the loading time cost of the first data, a data freshness target and a query time delay target; Determining a retrieval strategy according to the query time cost of the first data, the loading time cost of the first data, the data freshness target and the query time delay target; retrieving second data and a data relationship associated with the first data from a data destination according to the retrieval policy, wherein the second data and the data relationship satisfy the data freshness goal, the data relationship comprising a relationship between the second data and the first data; receiving a second query request from the client, the second query request including a second data identifier; The second data indicated by the second data identifier is sent back to the client.

Description

Data query method and device for adjusting data loading to reduce query time delay Cross Reference to Related Applications The present application claims priority from U.S. patent application Ser. No. 18/480,348, filed on Ser. No. 10/2023, the disclosure of which is incorporated herein by reference in its entirety. Technical Field The present invention relates to the field of computers, and more particularly to a data querying method, a computer device, a non-transitory computer readable storage medium and a communication system for adjusting data loading to reduce querying latency. Background Currently, data from multiple data sources is typically subjected to an extraction, transformation and loading (ETL) process, and the acquired processed data is stored in a data destination as basic data for subsequent data mining and data analysis. For example, data from multiple data sources is processed through a high performance ETL framework. But if the target data does not exist in the data destination, the client may experience difficulty in retrieving it, which may result in a query failure. On the other hand, real-time online loading of target data from a data source to a data destination may result in significant query latency. Disclosure of Invention The embodiment of the invention provides a data query method and a related device, which are beneficial to reducing query time delay and improving query efficiency. According to a first aspect, a data query method is provided. The method comprises the steps of determining a retrieval strategy according to the query time cost of first data, the loading time cost of the first data, a data freshness target and a query time delay target before, after or simultaneously with sending the first data back to a client according to a first query request, wherein the first query request is received from the client, and retrieving second data and data relations associated with the first data from a data destination according to the retrieval strategy. The second data and the data relationship satisfy the data freshness goal, the data relationship comprising a relationship between the second data and the first data. Further, after receiving the second query request, the preloaded second data is sent back. In some embodiments of the invention, the loading is adjusted based on analysis of the query. For example, by analyzing parameters associated with previously queried data and pre-loading the data associated with the previously queried data to a data destination, the associated data may be quickly retrieved when another query request is received. In schemes where the query and load are not relevant, any data is loaded to the data destination. Thus, the data to be queried may not be stored in the data destination and may need to be loaded from the data source. Compared with schemes irrelevant to query and loading, the scheme provided by some embodiments of the invention meets the data freshness target and the query delay target, thereby reducing the query delay and improving the query efficiency. In a possible implementation, retrieving the second data and the data relationship associated with the first data from the data destination according to the retrieval policy includes returning the second data and the data relationship from the data destination if the data destination contains the second data and the data relationship. Since the data destination contains the second data and the data relationship, the second data can be retrieved as soon as possible. For example, returning the second data and the data relationship from the data destination if the data destination contains the second data and the data relationship includes returning the second data and the data relationship from the data destination that satisfy the data source freshness if the second data and the data relationship contained in the data destination satisfy the data source freshness. For another example, returning the second data and the data relationship from the data destination if the data destination contains the second data and the data relationship includes loading the second data and the data relationship that satisfy the data source freshness from a data source to the data destination if the second data and the data relationship contained in the data destination do not satisfy a data source freshness, and returning the second data and the data relationship that satisfy the data source freshness from the data destination. In another possible implementation, retrieving the second data and the data relationship associated with the first data from the data destination according to the retrieval policy includes loading the second data and the data relationship from a data source to the data destination and returning the second data and the data relationship from the data destination without the data destination containing the second data and the data relationship. In the case where the data destinatio