Search

EP-4445258-B1 - ADAPTIVE TELEMETRY SAMPLING

EP4445258B1EP 4445258 B1EP4445258 B1EP 4445258B1EP-4445258-B1

Inventors

  • DUBYNSKIY, SERGIY
  • SHUBIN, TATIANA
  • GANGULY, SANDIPAN

Dates

Publication Date
20260506
Application Date
20220908

Claims (15)

  1. A data processing system comprising: a processor; and a machine-readable medium storing executable instructions that, when executed, cause the processor to perform operations comprising: obtaining (710) first telemetry data from a plurality of telemetry data sources; analyzing (720) the first telemetry data to identify a subset of telemetry data sources of the plurality of telemetry data sources for which a reduced sampling rate may be implemented, the first telemetry data being associated with a plurality of event types; determining (730) a reduced sampling rate for each event type of the plurality of event types, the reduced sampling rate indicating a percentage of the subset of telemetry data sources from which telemetry data associated with that event type is to be obtained; selecting (740) a subset of the event types for which the reduced sampling rate is to be applied, the selecting of the subset of the event types being based on one or more parameters entered by a user using one or more controls (505, 510, 515, 520, 525, 530, 535); obtaining (750) second telemetry data from the subset of telemetry data sources at the reduced sampling rate associated with each event type of the subset of event types; analyzing (760) the second telemetry data to determine one or more estimated metric values for one or more metrics, the one or more estimated metric values representing an estimate of what the one or more metric values would have been had the second telemetry data not been sampled at the reduced sampling rate; and generating (770) a report comprising the one or more estimated metric values and an estimated total cost saving based on an estimated cost saving associated with each event type, the cost savings being achieved by reducing the network, computing, and storage resources required to obtain and analyze the telemetry data.
  2. The data processing system of claim 1, wherein analyzing the first telemetry data to identify the subset of sampling data sources further comprises: executing a plurality of first simulations on the first telemetry data to identify the subset of the telemetry data sources.
  3. The data processing system of claim 2, wherein determining the reduced sampling rate for each event type of the plurality of event types further comprises: executing a plurality of second simulations on the first telemetry data to determine the reduced sampling rate for each event type of the plurality of event types.
  4. The data processing system of claim 1, wherein the one or more parameters are adapted to: select a particular channel of telemetry data, select one or more types of metrics, select one or more applications, and/or select one or more types of telemetry data.
  5. The data processing system of claim 2, wherein the machine-readable medium includes instructions configured to cause the processor to perform operations of: prior to obtaining the second telemetry based on the reduced sampling rate, determining that a threshold condition for obtaining the second telemetry data at the reduced sampling rate has been satisfied.
  6. The data processing system of claim 5, wherein the machine-readable medium includes instructions configured to cause the processor to perform operations of: prior to obtaining the second telemetry data, sending instructions to first plurality of telemetry data sources of the subset of the plurality of telemetry data sources to stop providing telemetry data to reduce a volume of the second telemetry data to the reduced sampling rate.
  7. The data processing system of claim 5, wherein the machine-readable medium includes instructions configured to cause the processor to perform operations of: determining that the threshold condition for obtaining the second telemetry data at the reduced sampling rate is no longer satisfied; and sending instructions to the first plurality of telemetry data sources to resume providing telemetry data responsive to determining that the threshold condition is no longer satisfied.
  8. The data processing system of claim 1, wherein the one or more parameters for selecting the subset of the event types comprise any one or more of a minimum savings threshold associated with each event type, a maximum error rate associated with each event type, a minimum reduction in volume of telemetry data associated with an entire subset of event types.
  9. A method implemented in a data processing system for adaptive telemetry sampling, the method comprising: obtaining (710) first telemetry data from a plurality of telemetry data sources; analyzing (720) the first telemetry data to identify a subset of telemetry data sources of the plurality of telemetry data sources for which a reduced sampling rate may be implemented, the first telemetry data being associated with a plurality of event types; determining (730) a reduced sampling rate for each event type of the plurality of event types, the reduced sampling rate indicating a percentage of the subset of telemetry data sources from which telemetry data associated with that event type is to be obtained; selecting (740) a subset of the event types for which the reduced sampling rate is to be applied, the selecting of the subset of the event types being based on one or more parameters entered by a user using one or more controls (505, 510, 515, 520, 525, 530, 535); obtaining (750) second telemetry data from the subset of telemetry data sources at the reduced sampling rate associated with each event type of the subset of event types; analyzing (760) the second telemetry data to determine one or more estimated metric values for one or more metrics, the one or more estimated metric values representing an estimate of what the one or more metric values would have been had the second telemetry data not been sampled at the reduced sampling rate; and generating (770) a report comprising the one or more estimated metric values and an estimated total cost saving based on an estimated cost saving associated with each event type.
  10. The method of claim 9, wherein analyzing the first telemetry data to identify the subset of sampling data sources further comprises: executing a plurality of first simulations on the first telemetry data to identify the subset of the telemetry data sources.
  11. The method of claim 10, wherein determining the reduced sampling rate for each event type of the plurality of event types further comprises: executing a plurality of second simulations on the first telemetry data to determine the reduced sampling rate for each event type of the plurality of event types.
  12. The method of claim 10, wherein the one or more parameters are adapted to: select a particular channel of telemetry data, select one or more types of metrics, select one or more applications, and/or select one or more types of telemetry data.
  13. The method of claim 10, further comprising: prior to obtaining the second telemetry based on the reduced sampling rate, determining that a threshold condition for obtaining the second telemetry data at the reduced sampling rate has been satisfied.
  14. The method of claim 13, further comprising: prior to obtaining the second telemetry data, sending instructions to first plurality of telemetry data sources of the subset of the plurality of telemetry data sources to stop providing telemetry data to reduce a volume of the second telemetry data to the reduced sampling rate.
  15. The method of claim 13, further comprising: determining that the threshold condition for obtaining the second telemetry data at the reduced sampling rate is no longer satisfied; and sending instructions to the first plurality of telemetry data sources to resume providing telemetry data responsive to determining that the threshold condition is no longer satisfied.

Description

BACKGROUND Telemetry data is data generated by software that may be collected to improve customer experience. The telemetry data may be collected from software on a customer's computing device, from software executed on a cloud-based computing environment, or other systems on which the software has been deployed. The telemetry data may include information that may be analyzed to determine the security, health, quality, and performance of the software. The telemetry data may be used to assess user usage patterns, quality of software builds that have been deployed, identify issues with the deployed builds, and make release decisions for future builds to be deployed. Thus, the telemetry data collected may provide valuable insights into the software product and the usage patterns of the users of that software product. Consequently, the amount and types of telemetry data being collected continues to rapidly increase. The proliferation of telemetry data has led to companies facing the problem of excessive telemetry data collection. The additional telemetry data increases costs related to networking, computing, and data storage. Additional network capacity is required to manage the increasing amounts of telemetry data being collected. Furthermore, additional computing resources and data storage resources are also required to process and store the additional telemetry data being collected. All these costs impact the cost of goods sold (COGS) of the software product. Companies must find a way to balance the costs associated with receiving and processing vast quantities of telemetry data with the benefits of the information obtained from analyzing this data. Hence, there is a need for improved systems and methods of deploying obtaining and analyzing telemetry data. US2020159607 relates to a veto-based model for measuring product health. A technique is provided for answering the question of how often a user has a good experience with an online product by defining a good experience with the online product, inventorying the telemetry signals needed for measuring performance of the online product, and providing a basis for combining the telemetry signals into one or more meaningful metrics. US 2020/356459 A1 relates to processes and systems that determine efficient sampling rates of metrics generated in a distributed computing system. Efficient sampling rates for metrics generated by various different metric sources are determined. Each metric is evaluated to determine whether the metric is a non-constant metric. An efficient sampling rate is determined for each non-constant metric by constructing a plurality of corresponding reduced metrics. Each reduced metric comprises a different subsequence of the corresponding metric. An information loss is computed for each reduced metric. Sampling rates of high variability metrics that are left unchanged are efficient sampling rates. SUMMARY It is the object of the present invention to improve analyzing telemetry data with reduced computational effort without hampering insights provided by the telemetry data. This object is solved by the subject matter of the independent claims. Preferred embodiments are defined by the dependent claims. An example data processing system according to the disclosure may include a processor and a machine-readable medium storing executable instructions. The instructions when executed cause the processor to perform operations including obtaining first telemetry data from a plurality of telemetry data sources; analyzing the first telemetry data to identify a subset of telemetry data sources of the plurality of telemetry data sources for which a reduced sampling rate may be implemented, the first telemetry data being associated with a plurality of event types; determining a reduced sampling rate for each event type of the plurality of event types, the reduced sampling rate indicating a percentage of the subset of telemetry data sources from which telemetry data associated with that event type is to be obtained; selecting a subset of the event types for which the reduced sampling rate is to be applied; obtaining second telemetry data from the subset of telemetry data sources at the reduced sampling rate associated with each event type of the subset of event types; analyzing the second telemetry data to determine one or more estimated metric values for one or more metrics, the one or more estimated metric values representing an estimate of what the one or more metric values would have been had the second telemetry data not been sampled at the reduced sampling rate; and generating a report comprising the one or more estimated metric values and an estimated total cost saving based on an estimated cost saving associated with each event type. An example method implemented in a data processing system for adaptive telemetry sampling includes obtaining first telemetry data from a plurality of telemetry data sources; analyzing the first telemetry data to identify