Search

US-12625976-B2 - Data analytics system and its operation

US12625976B2US 12625976 B2US12625976 B2US 12625976B2US-12625976-B2

Abstract

A method for operating a data analytics system includes storing, at a first server system of the data analytics system, a first key associated with a data provider and/or storing, at a second server system of the data analytics system, a second key associated with the data provider. The first key and the second key are complementary keys generated using a secret sharing based cryptographic algorithm based on a policy arranged to control usage of data provided by the data provider. The first key and the second key are arranged to facilitate performing of a data analytics operation at the data analytics system.

Inventors

  • Cong Wang
  • Chengjun Cai
  • Yichen Zang

Assignees

  • CITY UNIVERSITY OF HONG KONG

Dates

Publication Date
20260512
Application Date
20230814

Claims (20)

  1. 1 . A method for operating a data analytics system, comprising: storing, at a first server system of the data analytics system, a first key associated with a data provider; and storing, at a second server system of the data analytics system, a second key associated with the data provider; wherein the first key and the second key are complementary keys generated using a secret sharing based cryptographic algorithm based on a policy arranged to control usage of data provided by the data provider; and wherein the first key and the second key are arranged to facilitate performing of a data analytics operation at the data analytics system.
  2. 2 . The method of claim 1 , wherein the secret sharing based cryptographic algorithm comprises a distributed point function based cryptographic algorithm.
  3. 3 . The method of claim 1 , wherein the data analytics operation is based on the secret sharing based cryptographic algorithm.
  4. 4 . The method of claim 3 , wherein the data analytics operation is arranged to: analyze data provided by data providers including the data provider based on a data analytics query; and determine a data output of data provided by one or more data providers that match the data analytics query.
  5. 5 . The method of claim 1 , wherein the policy comprises a plurality of conditions; and wherein (i) at least two of the plurality of conditions being associated with an AND operator; (ii) at least two of the plurality of conditions being associated with an OR operator; and/or (iii) at least two of the plurality of conditions being associated with a NOT operator.
  6. 6 . The method of claim 5 , wherein the plurality of conditions includes at least two of the following conditions: a condition associated with data consumer or type of data consumer that can or cannot access the data provided by the data provider; a condition associated with a location requirement for data consumer or type of data consumer that can or cannot access the data provided by the data provider; a condition associated with usage control of the data provided by the data provider; and a condition associated with operation that can or cannot be performed using the data provided by the data provider.
  7. 7 . The method of claim 1 , further comprising: receiving or obtaining, at the first server system, a first share of the data provided by the data provider; receiving or obtaining, at the second server system, a second share of the data provided by the data provider; encrypting, at the first server system, the first share of the data based on a secret key of the first server system using a symmetric homomorphic stream encryption (SHSE) based method, to obtain an encrypted first share of the data; encrypting, at the second server system, the second share of the data based on a secret key of the second server system using the symmetric homomorphic stream encryption (SHSE) based method, to obtain an encrypted second share of the data, the secret key of the second server system being different from the secret key of the first server system; and determining, at one or both of the first server system and the second server system, an encrypted data based on the encrypted first share of the data and the encrypted second share of the data.
  8. 8 . The method of claim 7 , wherein determining the encrypted data comprises: determining, at the first server system, the encrypted data based on the encrypted first share of the data and the encrypted second share of the data; and determining, at the second server system, the encrypted data based on the encrypted first share of the data and the encrypted second share of the data.
  9. 9 . The method of claim 7 , further comprising: storing the encrypted data at each of the first server system and the second server system.
  10. 10 . The method of claim 7 , wherein the encrypted first share of the data is in the form of a ciphertext share; wherein the encrypted second share of the data is in the form of a ciphertext share; and wherein the encrypted data is in the form of a ciphertext formed based on the ciphertext share of the encrypted first share of the data and the ciphertext share of the encrypted second share of the data.
  11. 11 . The method of claim 7 , wherein the data consists only of the first share of the data and the second share of the data; and wherein the first share of the data and the second share of the data are split from the data.
  12. 12 . The method of claim 11 , wherein the first share of the data and the second share of the data are split randomly from the data.
  13. 13 . The method of claim 11 , wherein the first share of the data and the second share of the data are split from the data based on an additive secret sharing based method.
  14. 14 . The method of claim 7 , wherein the secret key of the first server system is a pseudo-random function key generated based on a pseudo-random function; and wherein the secret key of the second server system is a pseudo-random function key generated based on a pseudo-random function.
  15. 15 . The method of claim 1 , wherein the data includes a data value.
  16. 16 . The method of claim 15 , wherein the data includes the data value and one or more values arithmetically associated with the data value.
  17. 17 . The method of claim 1 , wherein the data includes a vector of bits.
  18. 18 . The method of claim 7 , wherein the data provided by the data provider is part of a data stream that is provided by the data provider and includes, at least, a first data corresponding to a first epoch and a second data corresponding to a second epoch, the first data being the data; and wherein the method further comprises: receiving or obtaining, at the first server system, a first share of the second data provided by the data provider; receiving or obtaining, at the second server system, a second share of the second data provided by the data provider; encrypting, at the first server system, the first share of the second data based on the secret key of the first server system using the symmetric homomorphic stream encryption (SHSE) based method, to obtain an encrypted first share of the second data; encrypting, at the second server system, the second share of the second data based on the secret key of the second server system using the symmetric homomorphic stream encryption (SHSE) based method, to obtain an encrypted second share of the second data; and determining, at one or both of the first server system and the second server system, an encrypted second data based on the encrypted first share of the second data and the encrypted second share of the second data.
  19. 19 . The method of claim 18 , wherein determining the encrypted second data comprises: determining, at the first server system, the encrypted second data based on the encrypted first share of the second data and the encrypted second share of the second data; and determining, at the second server system, the encrypted second data based on the encrypted first share of the second data and the encrypted second share of the second data.
  20. 20 . The method of claim 18 , further comprising: storing the encrypted second data at each of the first server system and the second server system.

Description

TECHNICAL FIELD The invention relates to a data analytics system and its operation. BACKGROUND In today's society, healthcare, business decisions, and government operations all rely heavily on the availability of data and advanced analytic tools for accurate decision-making. However, in practice, data is often fragmented and stored locally by individuals, and concerns related to data leakage and unauthorized data sharing have made it difficult to motivate individuals to share their data. Traditionally, once the data is shared by the data owners, it goes out of the hands of the data owners, i.e., it can then be copied, traded, or abused in uncontrollable manners. According to some recent research, many individuals have concerns about how companies and governments use their data and/or feel that they have little or no control over how their data is used. To remedy the above problem, some existing techniques construct privacy frameworks that allow owners to define their privacy preferences and regulate data usage. However, most of these techniques require deployment of trusted hardware to enforce the policies of data owners. For real-world data processing systems that do not apply trusted hardware (e.g., Apache Kafka), they would still operate in a notice and consent mode and rely on centralized trusted authorities for policy enforcement. Problematically, however, data breach and misuse incidents due to abuse by such trusted authorities exist. Apart from the lack of privacy-preserving and enforceable data analytic tools that do not rely on centralized trust, the data policies may be susceptible to attack and may help an attacker infer the sensitive data of the data owners. As an example, consider a data owner, Alice, who decides to authorize her data to an analytic task q. While Alice can encrypt her data for confidentiality protections, based on a side information that q is initiated by a psychiatrist (e.g., by looking up information about q on the Internet), an attacker can readily learn that Alice's data will be used by a psychiatrist and thus Alice might be suffering from mental illness. One approach to hide such policy-related metadata is to encrypt the data policies and later adopt secure computation techniques on the server side (e.g., outsourced multi-party computation) to privately decrypt and use the data policy. However, this approach can only preserve the confidentiality of the underlying computation process. A curious server can still determine whether data of an owner has been used for a given task (by observing other metadata like data access patterns) and infer the same sensitive information about the data owner. The above metadata leakage problem is related to some existing security techniques that strive to preserve oblivious data access, i.e., to hide which data have been accessed or used for a query execution. For example, oblivious RAM (ORAM) can be attached with secure computation techniques to fulfill the privacy goals for both the data and its metadata. However, most existing ORAM constructions focus only on a single owner setting or would rely on trusting a proxy to maintain the encrypted RAM storage. On the other hand, ORAM constructions that can support multi-owner settings would generally incur heavy computation costs, making them difficult to adopt or deploy in practice. SUMMARY OF THE INVENTION In a first aspect, there is provided a method for operating a data analytics system. The method includes: storing, at a first server system of the data analytics system, a first key associated with a data provider, and/or storing, at a second server system of the data analytics system, a second key associated with the data provider. The first key and the second key are complementary keys generated using a secret sharing based cryptographic algorithm based on a policy arranged to control usage of data provided by the data provider. The first key and the second key are arranged to facilitate performing of a data analytics operation at the data analytics system. For example, the data provider may be a data owner. For example, the generation of the first key and the second key may be performed at a data provider device (e.g., a computing device of any form). In some embodiments of the first aspect, the secret sharing based cryptographic algorithm comprises a distributed point function based cryptographic algorithm. In some embodiments of the first aspect, the data analytics operation is based on the secret sharing based cryptographic algorithm. In some embodiments of the first aspect, the data analytics operation is arranged to: analyze data provided by data providers including the data provider based on a data analytics query, and determine a data output of data provided by one or more data providers that match the data analytics query. In some embodiments of the first aspect, the policy comprises a single condition. In some embodiments of the first aspect, the policy comprises a plurality of