CN-121979878-A - Index establishment method, data query method, system and related equipment

CN121979878ACN 121979878 ACN121979878 ACN 121979878ACN-121979878-A

Abstract

The embodiment of the application provides an index establishing method, a data query system and related equipment, which are applied to the technical field of data processing. And acquiring a first data set from the TEE of the main node, and dividing the first data included in the first data set according to the key attribute value of the first data to obtain a first sub-data set. Based on the first sub-data set, a global index is constructed. In the data query method, in the TEE of the master node, a third target slave node is determined from the slave nodes by utilizing the global index and the key attribute value query range obtained after decryption. And in the TEE of the third target slave node, determining query data according to the query range of the key attribute value, and feeding the query data back to the TEE of the master node. The master node feeds back the query result to the client. Thus, a global index capable of locating the slave node storing the query data is established, the data storage and the data query of the distributed database system based on the TEE are realized, and the safety requirement of data processing is met.

Inventors

LIU WEN
GUO LIANG
PENG YANGUO

Assignees

华为技术有限公司

Dates

Publication Date: 20260505
Application Date: 20241030

Claims (20)

1. An index building method, wherein the method is applied to a distributed database system, the distributed database system comprising a master node and a plurality of slave nodes, the master node and the slave nodes running a viable execution environment TEE, the method comprising: in the TEE of the master node, the following steps are performed: Acquiring a first data set, wherein the first data set comprises a plurality of pieces of first data; ordering the first data included in the first data set according to the order of the key attribute values of the first data; dividing the first data set according to the number of nodes of the slave nodes to obtain first sub-data sets with the number of the nodes, wherein the first sub-data sets comprise at least one piece of first data, and the first sub-data sets are in one-to-one correspondence with the slave nodes; And constructing a global index, wherein the global index comprises a corresponding relation between the first sub-data set and a key attribute value range of first data included in the first sub-data set.
2. The method according to claim 1, wherein the method further comprises: and constructing a local index of the slave node in the TEE of the slave node, wherein the local index comprises a corresponding relation between first data included in the first sub-data set and key attribute values of the first data.
3. The method of claim 2, wherein said constructing a local index of the slave node comprises: Dividing the first sub-data set into at least one data set, each data set comprising at least one piece of first data; determining a first group identity for each of said data groups; and constructing a local index of the slave node according to the first group identifier, wherein the local index comprises a data identifier of first data included in the first sub-data set and a corresponding relation between data information, the data identifier comprises a first group identifier of a data group to which the first data belongs, and the data information of the first data comprises a key attribute value of the first data.
4. A method according to claim 3, wherein said dividing said first sub-data set into at least one data group comprises: dividing a key attribute value range of first data included in the first sub-data set into at least one key attribute value sub-range; Clustering the first data included in the first sub-data set according to the at least one key attribute value sub-range and the key attribute value of the first data included in the first sub-data set to obtain at least one data group, wherein a first group identification of the data group is determined according to the number of the data groups and the sequence of the key attribute value sub-range.
5. The method of any of claims 1-4, wherein the constructing a global index comprises: establishing leaf nodes with the number being the number of the nodes, wherein the leaf nodes are in one-to-one correspondence with the first sub-data sets, and the leaf nodes comprise set identifiers of the corresponding first sub-data sets and key attribute value ranges of first data included in the corresponding first sub-data sets; And constructing a binary tree in a bottom-up mode aiming at each leaf node to obtain a global index.
6. The method of any of claims 1-5, wherein the slave node runs a regular execution environment, REE, the method further comprising: Encrypting the local index and the first sub-data set in the TEE of the slave node to obtain an encrypted local index and an encrypted first sub-data set, and transmitting the encrypted local index and the encrypted first sub-data set to the REE of the slave node; The encrypted local index and the encrypted first sub-data set are obtained and stored in the REEs of the slave nodes.
7. The method according to any one of claims 2-4, further comprising: in the TEE of the slave node, the following operations are performed: Obtaining a dividing parameter; Determining the number of the data groups and a second group identifier of each data group according to the dividing parameters, wherein the number of the data groups is the product of the dividing parameters 2, the second group identifier is a binary number with the number of bits being the dividing parameters, and the data information also comprises the second group identifier of the data group to which the first data belongs; And storing the dividing parameters.
8. The method according to any one of claims 1-7, further comprising: in the TEE of the master node, the following operations are performed: acquiring a second data set, wherein the second data set comprises a plurality of pieces of second data, and the second data is used for updating stored data; determining a first target slave node according to the key attribute value of the second data and the global index; transmitting the second data to the first target slave node; And acquiring the second data from the TEE of the first target slave node, and updating the data stored by the first target slave node by using the second data based on the local index of the first target slave node.
9. The method according to any one of claims 1-8, further comprising: in the TEE of the master node, the following operations are performed: acquiring a third data set, wherein the third data set comprises a plurality of pieces of third data, and the third data is used for updating stored data; Determining a second target slave node according to the key attribute value of the third data set and the global index; if the second target slave node is determined to meet the re-balance condition after updating the stored data based on the third data, determining a re-balance slave node, wherein the re-balance slave node at least comprises the second target slave node; Acquiring stored data from the rebalancing slave node, and constructing a fourth data set, wherein the fourth data set comprises data to be stored, which is obtained after updating the stored data based on the third data; dividing the fourth data set according to key attribute values of data included in the fourth data set to obtain fourth sub-data sets with the number of the rebalancing slave nodes, wherein the fourth sub-data sets comprise at least one piece of data, and the fourth sub-data sets are in one-to-one correspondence with the rebalancing slave nodes; And updating the global index according to the key attribute value range of the data included in the fourth sub-data set.
10. The method according to claim 9, wherein the method further comprises: and constructing a local index of the rebalancing slave node according to the key attribute value range of the data included in the fourth sub-data set in the TEE of the rebalancing slave node.
11. The method of any of claims 1-10, wherein the global index is stored in a TEE of the master node.
12. A data query method, characterized in that the method is applied to a distributed database system comprising a master node and a plurality of slave nodes, the master node and the slave nodes running a viable execution environment TEE and running a regular execution environment REE, the master node storing a global index, the global index being established using the index establishment method of any one of claims 1-11, the method comprising: in the REE of the master node, acquiring an encryption query statement sent by a client, analyzing the encryption query statement to obtain an encryption key attribute value query range, and sending the encryption key attribute value query range to the TEE of the master node; in the TEE of the master node, the following operations are performed: decrypting the encrypted key attribute value query range to obtain a key attribute value query range; Determining a third target slave node corresponding to the key attribute value query range according to the global index; sending the key attribute value query range to the third target slave node; acquiring the key attribute value query range from the TEE of the third target slave node, acquiring query data according to the key attribute value query range, and sending the query data to the master node; And acquiring the query data in the TEE of the main node, generating a query result based on the query data, and sending the encrypted query result to the client.
13. The method of claim 12, wherein the slave node stores a local index, the local index being created using the index creation method of any of claims 2-11, the obtaining query data from the key attribute value query scope comprising: and acquiring query data according to the local index and the key attribute value query range.
14. The method of claim 12, wherein the global index is a binary tree, the binary tree including leaf nodes in one-to-one correspondence with the slave nodes, the leaf nodes including node identifications of the corresponding slave nodes and key attribute value ranges of data stored by the slave nodes, the determining a third target slave node corresponding to the key attribute value query range according to the global index comprising: traversing the binary tree, determining leaf nodes with key attribute value ranges at least partially overlapped with the key attribute value query ranges as target leaf nodes, and taking the range with the key attribute value ranges overlapped with the key attribute value query ranges as a key attribute value query sub-range; Taking the slave node corresponding to the target leaf node as a third target slave node, wherein the third target slave node corresponds to the key attribute value query sub-range; The sending the key attribute value query range to the third target slave node includes: and sending a key attribute value query sub-range corresponding to the third target slave node.
15. The method of claim 13, wherein the third target slave node stores the encrypted local index and the encrypted data of the third target slave node in the REEs of the third target slave node, and wherein the obtaining query data according to the local index and the key attribute value query range comprises: Determining a target first group identifier according to a key attribute value range, a partition parameter and the key attribute value query range of data stored by the third target slave node, wherein the partition parameter is stored in a TEE of the third target slave node in advance and is used for determining the number of data groups included by the third target slave node; determining a target data identifier according to the target first group identifier; Encrypting the target data identifier to obtain an encrypted target data identifier, and sending the encrypted target data identifier to the REE of the third target slave node so as to query the REE of the third target slave node by utilizing the encrypted target data identifier and the encrypted local index to obtain target encrypted data; Acquiring target encrypted data from REE of the third target slave node, and decrypting the target encrypted data to obtain target data; And if the key attribute value of the target data belongs to the key attribute value query range, taking the target data as query data.
16. A distributed database system, comprising a master node and a plurality of slave nodes, the master node and the slave nodes running a viable execution environment TEE; The method comprises the steps of obtaining a first data set in a TEE of a master node, sorting the first data included in the first data set according to the sequence of key attribute values of the first data, dividing the first data set according to the number of nodes of slave nodes to obtain first sub-data sets with the number of the nodes, wherein the first sub-data sets comprise at least one piece of the first data, the first sub-data sets are in one-to-one correspondence with the slave nodes, and constructing a global index, and the global index comprises the corresponding relation between the first sub-data sets and the key attribute value range of the first data included in the first sub-data sets.
17. The distributed database system of claim 16, wherein the slave node is configured to construct, in a TEE of the slave node, a local index of the slave node, the local index including a correspondence between first data included in the first sub-data set and key attribute values of the first data.
18. The distributed database system of claim 17, wherein the slave node for constructing a local index of the slave node comprises: The slave node is used for dividing the first sub-data set into at least one data set, each data set comprises at least one piece of first data, determining a first set identifier of each data set, constructing a local index of the slave node according to the first set identifier, wherein the local index comprises data identifiers of the first data included in the first sub-data set and a corresponding relation between data information, the data identifiers comprise the first set identifiers of the data sets to which the first data belong, and the data information of the first data comprises key attribute values of the first data.
19. The distributed database system of claim 18, wherein the slave node configured to divide the first sub-data set into at least one data group comprises: The slave node is configured to divide a key attribute value range of first data included in the first sub-data set into at least one key attribute value sub-range, and cluster the first data included in the first sub-data set according to the at least one key attribute value sub-range and the key attribute value of the first data included in the first sub-data set to obtain at least one data set, where a first group identifier of the data set is determined according to the number of the data sets and the sequence of the key attribute value sub-ranges.
20. The distributed database system according to any of claims 16-19, wherein the master node is configured to construct a global index, comprising: The main node is configured to establish leaf nodes with the number equal to that of the nodes, the leaf nodes are in one-to-one correspondence with the first sub-data sets, the leaf nodes include set identifiers of the corresponding first sub-data sets and key attribute value ranges of the first data included in the corresponding first sub-data sets, and a binary tree is constructed for each leaf node in a bottom-up manner to obtain a global index.

Description

Index establishment method, data query method, system and related equipment Technical Field The present application relates to the field of data processing technologies, and in particular, to an index establishing method, a data query method, a system, a device, a storage medium, and a computer program product. Background The cloud storage system is a system which integrates a large number of storage devices in a network through software to cooperatively work through cluster application, grid technology or a distributed file system, a distributed database and the like, and provides data storage and service access functions together. The user can acquire the storage resources provided by the cloud storage system in a payment mode, and store the data in the cloud, so that the local data storage cost of the user is reduced. The data that the user needs to upload to the cloud storage system may include critical data. Critical data is data that cannot be revealed to other users or organizations. In the process of storing the key data, the data security of the key data needs to be ensured, and the leakage of the key data is avoided. At present, a data encryption mode is generally adopted to improve the security of key data. Before uploading data to the cloud, the user encrypts the data. The cloud storage system stores the encrypted data, and adopts a full-secret processing technology to query the encrypted data, so that the possibility of data leakage is reduced. The full-secret processing technique may employ a combination of software and trusted execution environments (Trusted Execution Environment, TEE). The TEE builds a secure area in a central processor of the cloud server, so that programs and processed data loaded in the secure area are protected in terms of confidentiality, and execution of a secret state operation is facilitated. However, the full-secret processing technology based on the TEE is mainly applied to a centralized data system, and cannot be applied to a distributed database system, so that the data security degree of the distributed database system in the cloud storage system is difficult to meet the data storage requirement. Disclosure of Invention In view of the above, the present application provides an index building method, a data query method, a system, a device, a storage medium, and a computer program product, which can implement the construction of a global index and a local index of a distributed database system based on TEE, and improve the security level of data storage and data query on the basis of satisfying the efficient data processing of the distributed architecture. In a first aspect, the present application provides an index building method, which is applied to a distributed database system. The distributed database system includes a master node and a plurality of slave nodes. The master node and the slave node run a feasible execution environment TEE. In the TEE of the master node, a first data set comprising a plurality of pieces of first data is obtained, the first data is ordered according to the sequence of key attribute values of the first data, and the first data is divided according to the number of nodes of the slave node, so that a plurality of first sub-data sets are obtained. The number of first sub-data sets is the same as the number of nodes of the slave node. The first sub-data sets are in one-to-one correspondence with the slave nodes. The slave node stores the corresponding first sub-data set. In the TEE of the master node, a global index is built. The global index includes a correspondence between the first sub-data set and a key attribute value range of the first data included in the first sub-data set. Thus, a global index capable of positioning the slave node storing the first data is established at the master node, so that the data storage and data query of the distributed database system based on the TEE are realized, and the requirements of the data storage and query of the distributed architecture are met. And establishing a global index in the TEE of the node of the distributed database system, improving the safety of the first data and meeting the safety requirements of data storage and data query. In one possible implementation, the local index is built in the TEE of the slave node. The local index comprises a corresponding relation between first data included in the first sub-data set and key attribute values of the first data. In the process of querying, data stored in the slave node can be queried according to the local index. Based on the global index and the local index, two levels of index construction are realized, the architecture of the data stored by the distributed database system is corresponded, and the requirement of a user for querying the data by using the distributed database system is met. In one possible implementation, in the slave node, the first set of sub-data is divided into at least one data group. The local index is co