US-12625891-B2 - Data storage method and apparatus, and data reading method and apparatus

US12625891B2US 12625891 B2US12625891 B2US 12625891B2US-12625891-B2

Abstract

Embodiments of this specification provide a data storage method and apparatus, and a data reading method and apparatus. The data storage method is applied to a knowledge graph platform, and includes: splitting target knowledge graph data, to determine at least two pieces of to-be-stored target subgraph data, where the target knowledge graph data include a target entity node and at least one edge associated with the target entity node, and each piece of to-be-stored target subgraph data includes the target entity node and an edge with at least one target attribute; and storing the at least two pieces of to-be-stored target subgraph data in at least two consecutive data blocks based on an edge attribute, where an end entity identifier of to-be-stored target subgraph data stored in a previous data block is the same as a start entity identifier of to-be-stored target subgraph data stored in a current data block.

Inventors

Da Zhang

Assignees

Alipay (Hangzhou) Information Technology Co., Ltd.

Dates

Publication Date: 20260512
Application Date: 20230106
Priority Date: 20220302

Claims (18)

1 . A data storage method, applied to a knowledge graph platform, and comprising: splitting target knowledge graph data, to determine at least two pieces of to-be-stored target subgraph data, wherein the target knowledge graph data comprise a target entity node and at least one edge associated with the target entity node, and each piece of to-be-stored target subgraph data comprises the target entity node and an edge with at least one target attribute; and storing the at least two pieces of to-be-stored target subgraph data in at least two consecutive data blocks based on an edge attribute, wherein an end entity identifier of to-be-stored target subgraph data stored in a previous data block is the same as a start entity identifier of to-be-stored target subgraph data stored in a current data block.
2 . The data storage method according to claim 1 , wherein after storing the at least two pieces of to-be-stored target subgraph data in the at least two consecutive data blocks based on the edge attribute, the method further comprises: recording index information in each data block, and determining index array information based on the index information in each data block, wherein the index information comprises a start entity identifier, an end entity identifier, and an edge attribute condition associated with the end entity identifier that are of to-be-stored target subgraph data stored in each data block.
3 . The data storage method according to claim 2 , wherein determining the index array information based on the index information in each data block comprises: determining the start entity identifier and the end entity identifier of the to-be-stored target subgraph data in each data block; and in a case where it is determined that an end entity identifier and a start entity identifier that are adjacent in two consecutive data blocks are the same, processing index information in the two consecutive data blocks, to determine index array information, wherein entity identifiers in the index array information are arranged based on a storage sequence.
4 . The data storage method according to claim 1 , wherein splitting the target knowledge graph data, to determine the at least two pieces of to-be-stored target subgraph data comprises: determining a splitting parameter, and splitting the target knowledge graph data based on the splitting parameter, to determine the at least two pieces of to-be-stored target subgraph data; and wherein, correspondingly, before splitting the target knowledge graph data, the method further comprises: processing received to-be-processed data, to determine target entity data in the to-be-processed data and relational data associated with the target entity data; and determining a data structure of the to-be-processed data based on the target entity data and the relational data, and constructing the target knowledge graph data based on the data structure.
5 . The data storage method according to claim 4 , wherein storing the at least two pieces of to-be-stored target subgraph data in the at least two consecutive data blocks based on the edge attribute comprises: determining an edge direction in the to-be-stored target subgraph data, and classifying edges in the to-be-stored target subgraph data based on the edge direction, to determine at least one edge type, wherein the edge direction comprises an out-edge direction and an in-edge direction, the out-edge direction is a direction pointing from the target entity node to another entity node, and the in-edge direction is a direction pointing from another entity node to the target entity node; and storing the at least two pieces of to-be-stored target subgraph data in the at least two consecutive data blocks based on the at least one edge type.
6 . The data storage method according to claim 1 , wherein the data block further comprises a buffer; and wherein, correspondingly, storing the at least two pieces of to-be-stored target subgraph data the in at least two consecutive data blocks based on the edge attribute comprises: determining, from the to-be-stored target subgraph data, remaining data that cannot be stored in the at least two consecutive data blocks, and storing the remaining data in a buffer in an end data block in the at least two consecutive data blocks.
7 . A computing device, comprising: a memory and a processor, wherein the memory is configured to store computer-executable instructions, the processor is configured to execute the computer-executable instructions, and when the computer-executable instructions are executed by the processor the processor is caused to: split target knowledge graph data, to determine at least two pieces of to-be-stored target subgraph data, wherein the target knowledge graph data comprise a target entity node and at least one edge associated with the target entity node, and each piece of to-be-stored target subgraph data comprises the target entity node and an edge with at least one target attribute; and store the at least two pieces of to-be-stored target subgraph data in at least two consecutive data blocks based on an edge attribute, wherein an end entity identifier of to-be-stored target subgraph data stored in a previous data block is the same as a start entity identifier of to-be-stored target subgraph data stored in a current data block.
8 . The computing device according to claim 7 , wherein after the at least two pieces of to-be-stored target subgraph data is stored in the at least two consecutive data blocks based on the edge attribute, the processor is further caused to: record index information in each data block, and determine index array information based on the index information in each data block, wherein the index information comprises a start entity identifier, an end entity identifier, and an edge attribute condition associated with the end entity identifier that are of to-be-stored target subgraph data stored in each data block.
9 . The computing device according to claim 8 , wherein the processor is caused to determine the index array information based on the index information in each data block by being caused to: determine the start entity identifier and the end entity identifier of the to-be-stored target subgraph data in each data block; and in a case where it is determined that an end entity identifier and a start entity identifier that are adjacent in two consecutive data blocks are the same, process index information in the two consecutive data blocks, to determine index array information, wherein entity identifiers in the index array information are arranged based on a storage sequence.
10 . The computing device according to claim 7 , wherein the processor is caused to split the target knowledge graph data, to determine the at least two pieces of to-be-stored target subgraph data by being caused to: determine a splitting parameter, and split the target knowledge graph data based on the splitting parameter, to determine the at least two pieces of to-be-stored target subgraph data; and wherein, correspondingly, before the target knowledge graph data is split, the processor is further caused to: process received to-be-processed data, to determine target entity data in the to-be-processed data and relational data associated with the target entity data; and determine a data structure of the to-be-processed data based on the target entity data and the relational data, and construct the target knowledge graph data based on the data structure.
11 . The computing device according to claim 10 , wherein the processor is caused to store the at least two pieces of to-be-stored target subgraph data in the at least two consecutive data blocks based on the edge attribute by being caused to: determine an edge direction in the to-be-stored target subgraph data, and classify edges in the to-be-stored target subgraph data based on the edge direction, to determine at least one edge type, wherein the edge direction comprises an out-edge direction and an in-edge direction, the out-edge direction is a direction pointing from the target entity node to another entity node, and the in-edge direction is a direction pointing from another entity node to the target entity node; and store the at least two pieces of to-be-stored target subgraph data in the at least two consecutive data blocks based on the at least one edge type.
12 . The computing device according to claim 7 , wherein the data block further comprises a buffer; and wherein, correspondingly, the processor is caused to store the at least two pieces of to-be-stored target subgraph data the in at least two consecutive data blocks based on the edge attribute by being caused to: determine, from the to-be-stored target subgraph data, remaining data that cannot be stored in the at least two consecutive data blocks, and store the remaining data in a buffer in an end data block in the at least two consecutive data blocks.
13 . A non-transitory computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the processor is caused to: split target knowledge graph data, to determine at least two pieces of to-be-stored target subgraph data, wherein the target knowledge graph data comprise a target entity node and at least one edge associated with the target entity node, and each piece of to-be-stored target subgraph data comprises the target entity node and an edge with at least one target attribute; and store the at least two pieces of to-be-stored target subgraph data in at least two consecutive data blocks based on an edge attribute, wherein an end entity identifier of to-be-stored target subgraph data stored in a previous data block is the same as a start entity identifier of to-be-stored target subgraph data stored in a current data block.
14 . The non-transitory computer-readable storage medium according to claim 13 , wherein after the at least two pieces of to-be-stored target subgraph data is stored in the at least two consecutive data blocks based on the edge attribute, the processor is further caused to: record index information in each data block, and determine index array information based on the index information in each data block, wherein the index information comprises a start entity identifier, an end entity identifier, and an edge attribute condition associated with the end entity identifier that are of to-be-stored target subgraph data stored in each data block.
15 . The non-transitory computer-readable storage medium according to claim 14 , wherein the processor is caused to determine the index array information based on the index information in each data block by being caused to: determine the start entity identifier and the end entity identifier of the to-be-stored target subgraph data in each data block; and in a case where it is determined that an end entity identifier and a start entity identifier that are adjacent in two consecutive data blocks are the same, process index information in the two consecutive data blocks, to determine index array information, wherein entity identifiers in the index array information are arranged based on a storage sequence.
16 . The non-transitory computer-readable storage medium according to claim 13 , wherein the processor is caused to split the target knowledge graph data, to determine the at least two pieces of to-be-stored target subgraph data by being caused to: determine a splitting parameter, and split the target knowledge graph data based on the splitting parameter, to determine the at least two pieces of to-be-stored target subgraph data; and wherein, correspondingly, before the target knowledge graph data is split, the processor is further caused to: process received to-be-processed data, to determine target entity data in the to-be-processed data and relational data associated with the target entity data; and determine a data structure of the to-be-processed data based on the target entity data and the relational data, and construct the target knowledge graph data based on the data structure.
17 . The non-transitory computer-readable storage medium according to claim 16 , wherein the processor is caused to store the at least two pieces of to-be-stored target subgraph data in the at least two consecutive data blocks based on the edge attribute by being caused to: determine an edge direction in the to-be-stored target subgraph data, and classify edges in the to-be-stored target subgraph data based on the edge direction, to determine at least one edge type, wherein the edge direction comprises an out-edge direction and an in-edge direction, the out-edge direction is a direction pointing from the target entity node to another entity node, and the in-edge direction is a direction pointing from another entity node to the target entity node; and store the at least two pieces of to-be-stored target subgraph data in the at least two consecutive data blocks based on the at least one edge type.
18 . The non-transitory computer-readable storage medium according to claim 13 , wherein the data block further comprises a buffer; and wherein, correspondingly, the processor is caused to store the at least two pieces of to-be-stored target subgraph data the in at least two consecutive data blocks based on the edge attribute by being caused to: determine, from the to-be-stored target subgraph data, remaining data that cannot be stored in the at least two consecutive data blocks, and store the remaining data in a buffer in an end data block in the at least two consecutive data blocks.

Description

This application is a national stage entry of international application no. PCT/CN2023/071077, filed Jan. 6, 2023, which claims priority to Chinese Patent Application No. 202210197317.3, filed with the China National Intellectual Property Administration on Mar. 2, 2022, and entitled “DATA STORAGE METHOD AND APPARATUS, DATA READING METHOD AND APPARATUS”, which is incorporated here by reference in its entirety. TECHNICAL FIELD Embodiments of this specification relate to the field of computer technologies, and in particular, to a data storage method and a data reading method. BACKGROUND A knowledge graph is referred to as knowledge domain visualization or a knowledge domain mapping map in the library and information industry, is a series of different graphs that show a relationship between a knowledge development process and a structure, describes a knowledge resource and a carrier of the knowledge resource by using a visualization technology, and mines, analyzes, constructs, draws, and displays knowledge and a mutual relationship thereof. Currently, there is no uniform standard for a storage structure design of the knowledge graph. For a graph whose data amount is not very large and whose structure is fixed, a conventional database and a relational table are usually used for storage. However, when a data amount is large, an entity usually includes many attributes. If these attributes are simultaneously computed and stored in a storage medium, efficiency of data computing, storage, and retrieval is greatly reduced. SUMMARY In view of this, this specification provides a data storage method and a data reading method. One or more embodiments of this specification relate to a data storage apparatus, a data reading apparatus, a computing device, a computer-readable storage media, and a computer program, to overcome a technical disadvantage in the conventional technology. According to a first aspect of the embodiments of this specification, a data storage method is provided, applied to a knowledge graph platform, and including: splitting target knowledge graph data, to determine at least two pieces of to-be-stored target subgraph data, where the target knowledge graph data include a target entity node and at least one edge associated with the target entity node, and each piece of to-be-stored target subgraph data includes the target entity node and an edge with at least one target attribute; andstoring the at least two pieces of to-be-stored target subgraph data in at least two consecutive data blocks based on an edge attribute, where an end entity identifier of to-be-stored target subgraph data stored in a previous data block is the same as a start entity identifier of to-be-stored target subgraph data stored in a current data block. According to a second aspect of the embodiments of this specification, a data storage apparatus is provided, applied to a knowledge graph platform, and including: a graph splitting module, configured to split target knowledge graph data, to determine at least two pieces of to-be-stored target subgraph data, where the target knowledge graph data include a target entity node and at least one edge associated with the target entity node, and each piece of to-be-stored target subgraph data includes the target entity node and an edge with at least one target attribute; anda data storage module, configured to store the at least two pieces of to-be-stored target subgraph data in at least two consecutive data blocks based on an edge attribute, where an end entity identifier of to-be-stored target subgraph data stored in a previous data block is the same as a start entity identifier of to-be-stored target subgraph data stored in a current data block. According to a third aspect of the embodiments of this specification, a data reading method is provided, applied to a knowledge graph platform, and including: receiving a data reading request for target data, and determining a target storage location in index array information based on the data reading request, where the index array information is determined based on index information in each data block, and includes a start entity identifier, an end entity identifier, and an edge attribute condition associated with the end entity identifier that are of graph data stored in each data block; andreading the target data from a target data block based on the target storage location. According to a fourth aspect of the embodiments of this specification, a data reading apparatus is provided, applied to a knowledge graph platform, and including: a storage location determining module, configured to: receive a data reading request for target data, and determine a target storage location in index array information based on the data reading request, where the index array information is determined based on index information in each data block, and includes a start entity identifier, an end entity identifier, and an edge attribute condition associated with the e