CN-117633057-B - TEE-assisted safe and efficient distributed high-dimensional data similarity query system and method

CN117633057BCN 117633057 BCN117633057 BCN 117633057BCN-117633057-B

Abstract

The invention discloses a TEE-assisted safe and efficient distributed high-dimensional data similarity query method, which comprises the steps of constructing a TEE-assisted safe and efficient distributed high-dimensional data similarity query system, initializing the query system, and enabling a data owner to conduct data set The data in the system is constructed into a TPSS form and distributed to the first cloud server and the second cloud server, PB tree is constructed by using the outsourced data, and when the data set is in the data set When one update data is added or deleted, the PB tree is searched by the first cloud server, the second cloud server and the trusted execution environment through cooperative operation to add or delete the update data, and the first cloud server, the second cloud server and the trusted execution environment provide query service for the query agent according to a similarity query request provided by the query agent. The invention can support the safe query index construction based on outsourcing distributed high-dimensional data, efficient similarity query processing and data dynamic updating.

Inventors

ZHENG YANDONG
ZHANG SONGNIAN
WANG FENGWEI
ZHU HUI
YANG XIAOPENG
LI HUI

Assignees

西安电子科技大学
中国电子科技集团公司第五十四研究所

Dates

Publication Date: 20260512
Application Date: 20231206

Claims (10)

1. The TEE-assisted safe and efficient distributed high-dimensional data similarity query method is characterized by comprising the following steps of: The method comprises the steps of S1, constructing a TEE-assisted safe and efficient distributed high-dimensional data similarity query system, and initializing the safe and efficient distributed high-dimensional data similarity query system, wherein the system comprises a data owner DO, a first cloud server CS1, a second cloud server CS2, a trusted execution environment TEE, a neutral party NP and a query agency QA, the first cloud server CS1 is positioned on a first cloud server platform, and the second cloud server CS2 and the trusted execution environment TEE are positioned on a second cloud server platform; S2, the data owner DO owns the data The data in the cloud server is constructed into a TPSS form and distributed and packaged to the first cloud server CS1 and the second cloud server CS2; s3, constructing a PB tree by using data in the form of TPSS outsourced to the first cloud server CS1 and the second cloud server CS 2; s4, when in the data set Adding or deleting an update data When the PB tree is searched by the first cloud server CS1, the second cloud server CS2 and the trusted execution environment TEE through cooperative operation so as to update the data Adding or deleting; S5, according to the similarity query request proposed by the query agency QA, the first cloud server CS1, the second cloud server CS2 and the trusted execution environment TEE provide query services for the query agency QA.
2. The TEE-assisted secure efficient distributed high-dimensional data similarity querying method of claim 1, wherein initializing the secure efficient distributed high-dimensional data similarity querying system comprises: the data owner DO initializes the first cloud server CS 1; the query agent QA initializes the second cloud server CS2, and the neutral party NP creates the trusted execution environment TEE; the data owner DO negotiates with the neutral party NP to form a data Bit symmetric key And a pseudo-random function , wherein, As a safety parameter, the safety parameter is used, The query agent QA negotiates with the neutral party NP about one Bit symmetric key The data owner DO sends the symmetric key To the first cloud server CS1, will Public, the query agent QA sends the symmetric key To the second cloud server CS2, the neutral party NP will Deployed on the trusted execution environment TEE; the data owner DO discloses a data value field And discloses a leaf node capacity threshold num of the built PB tree.
3. The TEE-assisted secure efficient distributed high-dimensional data similarity querying method of claim 1, wherein S2 comprises: s201. j Each data in the data set owned by it Constructed in the TPSS form, i.e Wherein Is that 3 Random values in (a) and satisfy S202, the j-th Will be Outsourcing the data to the first cloud server CS1 Outsourcing to the second cloud server CS2.
4. The TEE-assisted secure efficient distributed high-dimensional data similarity querying method according to claim 1, wherein S3 comprises: If data set X If the data amount in the data set is smaller than the leaf node capacity threshold num, the first cloud server CS1 and the second cloud server CS2 directly construct the data set X Otherwise, the first cloud server CS1 and the second cloud server CS2 construct an internal node , wherein, A reference point vector in the form of a TPSS, Simultaneously, based on Data set X Segmentation into two sub-data sets And And is based on And Constructing a left subtree and a right subtree of the internal node, and finally constructing a PB tree in a TPSS form by the first cloud server CS1 and the second cloud server CS2 。
5. The TEE-assisted secure and efficient distributed high-dimensional data similarity query method of claim 4, wherein said first cloud server CS1 and said second cloud server CS2 construct an internal node Comprising: s301, the first cloud server CS1, the second cloud server CS2 and the trusted execution environment TEE generate two reference point vectors in TPSS form through cooperative calculation by utilizing a random value generation algorithm, and the reference point vectors are recorded as ; S302, the first cloud server CS1, the second cloud server CS2 and the trusted execution environment TEE generate a random tag in the TPSS form through cooperative calculation , ; S303, the first cloud server CS1, the second cloud server CS2 and the trusted execution environment TEE are according to Is to take the value of the data set Divided into And Wherein when In the time-course of which the first and second contact surfaces, And The method comprises the following steps: When (when) In the time-course of which the first and second contact surfaces, And The method comprises the following steps: Wherein, the Representing the euclidean distance of the two vectors, Representing the number of data in the dataset; S304, constructing an internal node by the first cloud server CS1 and the second cloud server CS2 And according to Creating a left subtree of an internal node according to Creating a right subtree of the internal node, and finally, the first cloud server CS1 and the second cloud server CS2 obtain a PB tree in the form of TPSS 。
6. The TEE-assisted secure and efficient distributed high-dimensional data similarity querying method according to claim 5, wherein the random value generation algorithm in S301 comprises: the second cloud server CS2 selects a random value And constructs its TPSS form The second cloud server CS2 reserves And will Sending to the trusted execution environment TEE, wherein D represents the value range of the reference point data; the trusted execution environment TEE selects a random value And constructs its TPSS form The trusted execution environment TEE reservation And will Sending to the second cloud server CS2; the second cloud server CS2 and the trusted execution environment TEE respectively calculate And Then the second cloud server CS2 will Sent to the first cloud server CS1, the trusted execution environment TEE will Sent to the first cloud server CS1, the generated random number is TPSS form is , wherein, , , 。
7. The TEE-assisted secure efficient distributed high-dimensional data similarity querying method according to claim 5, wherein S302 comprises: the first cloud server CS1 generates a random label Constructing its TPSS form And will Sending to the second cloud server CS2; The second cloud server CS2 generates a random label Constructing its TPSS form And will Sending to the first cloud server CS1; Secure multiplication algorithm calculation of TPSS (trusted execution environment) call by the first cloud server CS1, the second cloud server CS2 and the trusted execution environment TEE 。
8. The TEE-assisted secure efficient distributed high-dimensional data similarity querying method according to claim 5, wherein S4 comprises: the data owner DO will update the data In the form of TPSS Outsourcing to the first cloud server CS1 and the second cloud server CS2, wherein the first cloud server CS1 holds The second cloud server CS2 holds ; Searching the PB tree through cooperative operation, and when an internal node is Satisfy the following requirements Searching the left subtree, otherwise searching the right subtree until the leaf node; If update data For the data to be added, directly Inserted into searched leaf nodes, if updated data is inserted Later, the data size of the leaf node is greater than Dividing the current leaf node; If update data For each data of a leaf node, for the data to be deleted Judging whether or not to meet If the inequality is satisfied, the data is obtained And deleting.
9. The TEE-assisted secure efficient distributed high-dimensional data similarity querying method according to claim 5, wherein S5 comprises: s501 given a similarity query request At this time, the query agent QA requests the similarity query Encryption is And And will And To the first cloud server CS1 and the second cloud server CS2, wherein, , Representing the distance threshold value(s), ; S502, obtaining the data records meeting the query conditions by retrieving the PB tree, namely When a leaf node of the PB tree is retrieved, each data of the leaf node is judged Whether or not to meet If yes, data is sent Joining a query result set When the internal node of PB tree When being searched, judging whether or not If yes, the left subtree of the current internal node is pruned, and whether the current internal node meets the requirement is judged If yes, pruning the right subtree of the current internal node, otherwise, searching the left and right subtrees, and finally, obtaining a query result set by the first cloud server CS1 and the second cloud server CS2 ; S503 for The first cloud server CS1 will Returning to the query agent QA, the second cloud server CS2 will And returning to the query agent QA, and recovering a query result by the query agent QA according to the received data.
10. A TEE-assisted secure and efficient distributed high-dimensional data similarity query system, comprising a data owner DO, a first cloud server CS1, a second cloud server CS2, a trusted execution environment TEE, a neutral party NP, and a query agent QA, the first cloud server CS1 being located on a first cloud server platform, the second cloud server CS2 and the trusted execution environment TEE being located on a second cloud server platform, wherein, The data owner DO is configured to initialize the first cloud server CS1, perform key negotiation with the neutral party NP, send the negotiated key to the first cloud server CS1, and store the data set owned by the data owner DO X The first cloud server CS1 and the second cloud server CS2 are used for storing outsourcing data, establishing PB tree indexes based on the outsourcing data, dynamically updating the data and processing similarity query requests; The trusted execution environment TEE is used for assisting the first cloud server CS1 and the second cloud server CS2 in establishing a PB tree, dynamically updating data and processing query requests; the neutral party NP is used for creating the trusted execution environment TEE on a second cloud server platform, carrying out key negotiation with the data owner DO and the query agent QA, and deploying the negotiated key in the trusted execution environment TEE; The query agent QA is configured to perform key negotiation with the neutral party NP, send the negotiated key to the second cloud server CS2, encrypt a query request on behalf of a user, and initiate a query request to the first cloud server CS1 and the second cloud server CS2 cloud servers.

Description

TEE-assisted safe and efficient distributed high-dimensional data similarity query system and method Technical Field The invention belongs to the technical field of data similarity query, and particularly relates to a safe and efficient distributed high-dimensional data similarity query system and method assisted by a TEE. Background With the rapid development of the internet of things and information technology, big data presents a trend of rapid growth and distributed storage. Outsourcing big data to a cloud server having powerful storage and computing capabilities has become an effective means of relieving big data storage pressure and improving big data processing efficiency. However, with the enhancement of awareness of protecting privacy of people's data and the promulgation of relevant privacy protection laws and regulations, data is often stored in a cloud server in a dense state and allows the cloud server to provide data services such as queries to users based on the dense state data. The outsourcing inquiry based on the secret state data can effectively protect the data privacy, but encryption breaks the usability of the data, so that the secret state data inquiry efficiency is reduced. Constructing a data query index is a common strategy for improving outsourcing data query, encrypting the index, and sending the index and the encrypted data to a cloud server together to assist the outsourcing query. When data is stored in a data owner in a centralized manner, it is relatively easy to construct and encrypt a query index, and then encrypt the index based on plaintext data only, but the constructed index is basically incapable of supporting dynamic updating of the data and is not suitable for a scenario of data distributed storage. In particular, when data is stored in a distributed manner vertically or horizontally, construction of the query index is required to be achieved through a secure multiparty computing technology, so that computing cost of data owners is high, and communication cost among the data owners is high. Various solutions have been proposed by students for secure similarity queries of high-dimensional data, but existing methods have one or more limitations, such as linear query efficiency, inaccurate query results, and non-support of distributed data and dynamic updates of data. Specifically, scholars of Dallas division of Tex university put forward an approximate similarity query method based on Local sensitive hash (Local-SENSITIVE HASHING, LSH) technology in paper EFFICIENT SIMILARITY SEARCH over ENCRYPTED DATA, map points with similar distances to the same value by using a hash function, improve the efficiency of similarity query, and introduce a symmetrical searchable encryption technology to protect the privacy of similarity query based on LSH values. The method has higher query efficiency, but cannot support dynamic update of data. Based on the method, the scholars of the university of hong Kong city and the university of hong Kong city in paper EncSIM: AN ENCRYPTED SIMILARITY SEARCH SERVICE for Distributed High-dimensional Datasets skillfully combine full-pair LSH with a symmetrical searchable encryption method with forward security to realize safe and efficient similarity query and support dynamic addition of data. The scholars of the western electronic technology university propose a safe greedy partitioning method based on LSH coding and optimized linear ordering in paper Towards Secure Approximate k-Nearest Neighbor Query Over ENCRYPTED HIGH-Dimensional Data, and the safe approximate distance calculation and similarity query of high-Dimensional Data are realized by using the method. Scholars at the university of western electronic technology use coarse-grained, AES encryption and Paillier homomorphic encryption techniques to design a secure inverted file index in paper Secure k Nearest Neighbors Query for High-Dimensional Vectors in Outsourced Environments, and based on the index, efficient high-dimensional approximate distance calculation is realized. Although the method has sub-linear query efficiency, only approximate similarity query can be realized, the accuracy of the query result cannot be ensured, and dynamic update of distributed data cannot be fully supported. In addition, some security similarity query methods proposed by students facing to low-dimensional data can also be used for realizing the similarity query of the security high-dimensional data. Specifically, students at hong Kong university utilize reversible matrix to encrypt data in paper Secure kNN Computation on Encrypted Databases, and can effectively judge the distance sequence between data record and query data based on the secret state data, so that the security similarity query is realized, but the security of the method is weaker, and the known plaintext attack cannot be resisted. In order to enhance the security of the method, new university of brinz scholars in paper EFFICIENT PRIVACY-PRES