Search

US-12626164-B2 - System and method for reduction of data transmission by data reconstruction

US12626164B2US 12626164 B2US12626164 B2US 12626164B2US-12626164-B2

Abstract

Methods and systems for managing data collection are disclosed. To manage data collection, a system may include a data aggregator and a data collector. The data aggregator and data collector may utilize identical copies of a twin inference model to predict the future operation of the data collector. To minimize data transmission, the data collector may transmit a difference to the data aggregator. The data aggregator may reconstruct data from the data collectors using the difference from the data collector, and an inference generated by the copy of the twin inference model hosted by the data aggregator.

Inventors

  • Ofir Ezrielev
  • Jehuda Shemer

Assignees

  • DELL PRODUCTS L.P.

Dates

Publication Date
20260512
Application Date
20220421

Claims (20)

  1. 1 . A method for managing data collection in a distributed environment where data is collected in a data aggregator of the distributed environment and from at least a data collector operably connected to the data aggregator via a communication system, comprising: obtaining, by the data aggregator, a twin inference model using training data obtained, at least in part, from the data collector; distributing, by the data aggregator, a copy of the twin inference model to the data collector; receiving, from the data collector and by the data aggregator, a data transmission including a reduced-size representation obtained by data compression of a binary sequence based on: data obtained via a measurement performed by the data collector on one or more sensing targets, the data collector being a sensor, and a first inference generated by the data collector using the twin inference model, the first inference being intended to match the data; reconstructing, by the data aggregator and while the data aggregator does not have access to an actual copy of the data obtained via the measurement performed by the data collector, the data using the reduced-size representation and a second inference generated by the data aggregator, the second inference being intended to match the data and the first inference generated by the data collector; and performing, by the data aggregator and while the data aggregator still does not have access to the actual copy of the data obtained via the measurement performed by the data collector, an action set based at least in part on the reconstructed data, wherein the data collector performs a deletion of the actual copy of the data obtained via the measurement performed by the data collector, the reduced-size representation, and the first inference from internally within the data collector in response to data collector transmitting the reduced-size representation to the data aggregator and without first transmitting the actual copy to the data aggregator before the data collector performs the deletion, in response to receiving the reduced-size representation, transmitting, by the data aggregator and to the data collector, a command based on the reduced-size representation, in case where the reduced-size representation has a bitwise size that is larger than a threshold: obtaining, by the data collector and in response to the command, a portion of a series of measurements, the command includes a request for at least the portion of the series of measurements; and re-training, by the data aggregator, the twin inference model based on the portion of the series of measurements; in a case where the bitwise size of the reduced-size representation is less than the threshold: performing, by the data collector and in response to the command, the deletion of the actual copy of the data, the reduced-size representation, and the first inference; and deleting, by the data aggregator, the reduced-size representation.
  2. 2 . The method of claim 1 , further comprising: making a determination that the reduced-size representation falls below a threshold; and based on that determination: treating the twin inference model as being accurate, the twin inference model being implemented by the data aggregator and the data collector, and the twin inference model being used to obtain the first inference and the second inference.
  3. 3 . The method of claim 1 , further comprising: making a determination that the reduced-size representation falls outside of a threshold; and based on that determination: treating the twin inference model as being inaccurate, the twin inference model being implemented by the data aggregator and the data collector, and the twin inference model being used to obtain the first inference and the second inference.
  4. 4 . The method of claim 3 , further comprising: when the twin inference model is determined as being inaccurate: retraining the twin inference model using training data, the training data comprising a second portion of data obtained via a second series of measurements performed by the data collector.
  5. 5 . The method of claim 1 , wherein a value of the reduced-size representation decreases as accuracy of the twin inference model increases, and the value of the reduced-size representation increases as the accuracy of the twin inference model decreases.
  6. 6 . The method of claim 5 , wherein a quantity of bits necessary to communicate the reduced-size representation via the communication system decreases as the accuracy of the twin inference model increases.
  7. 7 . The method of claim 6 , wherein the action set is not based on any data from measurements performed by the data collector that is transmitted via the communication system to the data aggregator.
  8. 8 . The method of claim 1 , wherein the measurement is performed using a sensor that measures a characteristic of an ambient environment proximate to the data collector, the ambient environment proximate to the data collector being different from an ambient environment proximate to the data aggregator.
  9. 9 . The method of claim 8 , wherein the one or more actions are triggered to be performed based on the ambient environment proximate to the data collector and are independent from the ambient environment proximate to the data aggregator.
  10. 10 . A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for managing data collection in a distributed environment where data is collected in a data aggregator of the distributed environment and from at least a data collector operably connected to the data aggregator via a communication system, the operations comprising: obtaining, by the data aggregator, a twin inference model using training data obtained, at least in part, from the data collector; distributing, by the data aggregator, a copy of the twin inference model to the data collector; obtaining, from the data collector and by the data aggregator, a data transmission including a reduced-size representation obtained by data compression of a binary sequence based on: data obtained via a measurement performed by the data collector on one or more sensing targets, the data collector being a sensor, and a first inference generated by the data collector using the twin inference model, the first inference being intended to match the data; reconstructing, by the data aggregator and while the data aggregator does not have access to an actual copy of the data obtained via the measurement performed by the data collector, the data using the reduced-size representation and a second inference generated by the data aggregator, the second inference being intended to match the data and the first inference generated by the data collector; and performing, by the data aggregator and while the data aggregator still does not have access to the actual copy of the data obtained via the measurement performed by the data collector, an action set based at least in part on the reconstructed data, wherein the data collector performs a deletion of the actual copy of the data obtained via the measurement performed by the data collector, the reduced-size representation, and the first inference from internally within the data collector in response to data collector transmitting the reduced-size representation to the data aggregator and without first transmitting the actual copy to the data aggregator before the data collector performs the deletion, in response to receiving the reduced-size representation, transmitting, by the data aggregator and to the data collector, a command based on the reduced-size representation, in case where the reduced-size representation has a bitwise size that is larger than a threshold: obtaining, by the data collector and in response to the command, a portion of a series of measurements, the command includes a request for at least the portion of the series of measurements; and re-training, by the data aggregator, the twin inference model based on the portion of the series of measurements; in a case where the bitwise size of the reduced-size representation is less than the threshold: performing, by the data collector and in response to the command, the deletion of the actual copy of the data, the reduced-size representation, and the first inference; and deleting, by the data aggregator, the reduced-size representation.
  11. 11 . The non-transitory machine-readable medium of claim 10 , wherein the operations further comprise: making a determination that the reduced-size representation falls below a threshold; and based on that determination: treating the twin inference model as being accurate, the twin inference model being implemented by the data aggregator and the data collector, and the twin inference model being used to obtain the first inference and the second inference.
  12. 12 . The non-transitory machine-readable medium of claim 10 , wherein the operations further comprise: making a determination that the reduced-size representation falls outside of a threshold; and based on that determination: treating the twin inference model as being inaccurate, the twin inference model being implemented by the data aggregator and the data collector, and the twin inference model being used to obtain the first inference and the second inference.
  13. 13 . The non-transitory machine-readable medium of claim 12 , wherein the operations further comprise: when the twin inference model is determined as being inaccurate: updating the twin inference model using training data, the training data comprising a second portion of data obtained via a second series of measurements performed by the data collector.
  14. 14 . The non-transitory machine-readable medium of claim 10 , wherein a value of the reduced-size representation decreases as accuracy of the twin inference model increases, and the value of the reduced-size representation increases as the accuracy of the twin inference model decreases.
  15. 15 . The non-transitory machine-readable medium of claim 10 , wherein the measurement is performed using a sensor that measures a characteristic of an ambient environment proximate to the data collector, the ambient environment proximate to the data collector being different from an ambient environment proximate to the data aggregator.
  16. 16 . A data system, comprising: a communication system; a data collector, operably connected to the communication system, that is a sensor that performs a measurement on one or more sensing targets; and a data aggregator operably connected to the data collector via the communication system, the data aggregator comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for managing data collection in a distributed environment where data is collected in the data aggregator of the distributed environment and from at least the data collector, the operations comprising: obtaining, by the data aggregator, a twin inference model using training data obtained, at least in part, from the data collector; distributing, by the data aggregator, a copy of the twin inference model to the data collector; obtaining, from the data collector, a data transmission including a reduced-size representation obtained by data compression of a binary sequence based on: data obtained via the measurement performed by the data collector, and a first inference generated by the data collector using the twin inference model, the first inference being intended to match the data; reconstructing, and while the data aggregator does not have access to an actual copy of the data obtained via the measurement performed by the data collector, the data using the reduced-size representation and a second inference generated by the data aggregator, the second inference being intended to match the data and the first inference generated by the data collector; and performing, and while the data aggregator still does not have access to the actual copy of the data obtained via the measurement performed by the data collector, an action set based at least in part on the reconstructed data, wherein the data collector performs a deletion of the actual copy of the data obtained via the measurement performed by the data collector, the reduced-size representation, and the first inference from internally within the data collector in response to data collector transmitting the reduced-size representation to the data aggregator and without first transmitting the actual copy to the data aggregator before the data collector performs the deletion, in response to receiving the reduced-size representation, transmitting, by the data aggregator and to the data collector, a command based on the reduced-size representation, in case where the reduced-size representation has a bitwise size that is larger than a threshold: obtaining, by the data collector and in response to the command, a portion of a series of measurements, the command includes a request for at least the portion of the series of measurements; and re-training, by the data aggregator, the twin inference model based on the portion of the series of measurements; in a case where the bitwise size of the reduced-size representation is less than the threshold: performing, by the data collector and in response to the command, the deletion of the actual copy of the data, the reduced-size representation, and the first inference; and deleting, by the data aggregator, the reduced-size representation.
  17. 17 . The data system of claim 16 , wherein the operations further comprise: making a determination that the reduced-size representation falls below a threshold; and based on that determination: treating a twin inference model as being accurate, the twin inference model being implemented by the data aggregator and the data collector, and the twin inference model being used to obtain the first inference and the second inference.
  18. 18 . The data system of claim 16 , wherein the operations further comprise: making a determination that the reduced-size representation falls outside of a threshold; and based on that determination: treating a twin inference model as being inaccurate, the twin inference model being implemented by the data aggregator and the data collector, and the twin inference model being used to obtain the first inference and the second inference.
  19. 19 . The data system of claim 18 , wherein the operations further comprise: when the twin inference model is determined as being inaccurate: updating the twin inference model using training data, the training data comprising a second portion of data obtained via a second series of measurements performed by the data collector.
  20. 20 . The data system of claim 16 , wherein the measurement is performed using a sensor that measures a characteristic of an ambient environment proximate to the data collector, the ambient environment proximate to the data collector being different from an ambient environment proximate to the data aggregator.

Description

FIELD Embodiments disclosed herein relate generally to data collection. More particularly, embodiments disclosed herein relate to systems and methods to limit the transmission of data during data collection. BACKGROUND Computing devices may provide computer-implemented services. The computer-implemented services may be used by users of the computing devices and/or devices operably connected to the computing devices. The computer-implemented services may be performed with hardware components such as processors, memory modules, storage devices, and communication devices. The operation of these components may impact the performance of the computer-implemented services. BRIEF DESCRIPTION OF THE DRAWINGS Embodiments disclosed herein are illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements. FIG. 1 shows a block diagram illustrating a system in accordance with an embodiment. FIG. 2A shows a block diagram illustrating a data aggregator in accordance with an embodiment. FIG. 2B shows a block diagram illustrating a data collector in accordance with an embodiment. FIG. 3A shows a flow diagram illustrating a method of data collection via data reconstruction in accordance with an embodiment. FIG. 3B shows a flow diagram illustrating a method of determining accuracy of a twin inference model in accordance with an embodiment. FIG. 3C shows a flow diagram illustrating a method of data collection in accordance with an embodiment. FIGS. 4A-4I show block diagrams illustrating a system in accordance with an embodiment over time. FIG. 5 shows a block diagram illustrating a data processing system in accordance with an embodiment. DETAILED DESCRIPTION Various embodiments will be described with reference to details discussed below, and the accompanying drawings will illustrate the various embodiments. The following description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of various embodiments. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments disclosed herein. Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in conjunction with the embodiment can be included in at least one embodiment. The appearances of the phrases “in one embodiment” and “an embodiment” in various places in the specification do not necessarily all refer to the same embodiment. In general, embodiments disclosed herein relate to methods and systems for managing data collection in a distributed environment. To manage data collection, the system may include a data aggregator and a data collector. The data aggregator may utilize a copy of a twin inference model to predict data based on future measurements performed by data collectors throughout a distributed environment without having access to the measurements. The data collector may host a second identical copy of a twin inference model. Therefore, the data aggregator and data collector may have access to identical sets of inferences. To obtain the twin inference models, training data may be used to train the twin inference models to predict future measurements obtained by the data collectors. The data collectors may include any type and quantity of data collectors including, for example, temperature data collectors, pH data collectors, humidity data collectors, etc. Therefore, the disclosed system may be broadly applicable to a wide variety of data collectors that may generate various types and quantities of measurements. To attempt to reduce data transmission, the data aggregator may obtain a difference from a data collector, a difference being a reduced-size representation of a measurement performed by a data collector. The data collector may obtain this difference using (i) a measurement performed by the data collector and (ii) an inference obtained via the copy of the twin inference model hosted by the data collector. The data aggregator may reconstruct data using (i) the difference obtained from the data collector and (ii) the inference obtained via a copy of the twin inference model hosted by the data aggregator. The inference obtained by the data aggregator may be identical to the inference utilized by the data collector to obtain the difference. Therefore, the data aggregator may access the exact measurement performed by the data collector without measurements being transmitted over a communication system. Consequently, data transmission may be reduced, communication system bandwidth may be conserved, and energy consumption of the devices may be reduced throughout the system. In an embodiment, a method for managing data collection in a distributed environment where data is collected in a data aggregator of the distributed environment an