US-12625854-B2 - Analytical platform for distributed data

US12625854B2US 12625854 B2US12625854 B2US 12625854B2US-12625854-B2

Abstract

A tree data structure that includes an exponential root node and a plurality of regular child nodes arranged in hierarchical levels is generated. A number of the hierarchical levels corresponds to a specified precision value. A numerical data value is received. The numerical data value is truncated according to the specified precision value to generate a truncated value. Portions of the truncated value are stored in the tree data structure by storing an exponent of the truncated value in the exponential root node, and storing each significant digit of the truncated value in a respective regular child node based on significance level. Statistical information is computed for each node storing a portion of the truncated value. The statistical information is provided in response to a query.

Inventors

Christopher Phillip Bonnell

Assignees

PagerDuty, Inc.

Dates

Publication Date: 20260512
Application Date: 20241127

Claims (20)

1 . A method comprising: generating a tree data structure comprising an exponential root node and a plurality of regular nodes arranged in nested levels, wherein a number of the nested levels corresponds to a specified precision value, and wherein each regular node is configured to store a single significant digit value rather than a range of values; receiving a numerical data value; truncating the numerical data value according to the specified precision value to generate a truncated value; storing portions of the truncated value in the tree data structure by: storing an exponent of the truncated value in the exponential root node; and storing each significant digit of the truncated value in a respective regular node based on significance level, wherein a first regular node positioned beneath the exponential root node stores a first significant digit of the truncated value, and wherein a second regular node positioned beneath the first regular node stores a second significant digit having less significance than the first significant digit; computing statistical information for each node storing a portion of the truncated value, wherein computing the statistical information comprises automatically scanning the tree data structure; and providing the statistical information in response to a query.
2 . The method of claim 1 , wherein computing the statistical information comprises: computing a count of data values inserted at each node.
3 . The method of claim 1 , wherein computing the statistical information comprises: computing a sum of data values inserted at each node.
4 . The method of claim 1 , wherein computing the statistical information comprises: computing a count of data values inserted at each node; and computing a sum of the data values stored at the each node.
5 . The method of claim 1 , wherein generating the tree data structure comprises: initializing the plurality of regular child nodes as empty nodes.
6 . The method of claim 1 , wherein the specified precision value represents a number of significant figures to be stored for numerical data values.
7 . The method of claim 1 , further comprising: sampling the numerical data value from a data stream at time intervals.
8 . The method of claim 1 , further comprising: receiving a second numerical data value; truncating the second numerical data value according to the specified precision value to generate a truncated second numerical data value; and updating the statistical information for nodes storing portions of the truncated second numerical data value.
9 . The method of claim 1 , further comprising: receiving multiple numerical data values; and processing the multiple numerical data values in a single scan to compute the statistical information.
10 . The method of claim 1 , further comprising: merging the tree data structure with a second tree data structure by: identifying overlapping nodes between the tree data structure and the second tree data structure; and combining the statistical information for the overlapping nodes.
11 . The method of claim 1 , wherein the query comprises a request for estimated answers to database queries.
12 . The method of claim 1 , wherein the specified precision value is configurable and controls value truncation operation across a value range.
13 . A system, comprising: a memory; and a processor, the processor configured to execute instructions stored in the memory to: generate a tree data structure comprising an exponential root node and a plurality of regular nodes arranged in nested levels, wherein a number of the nested levels corresponds to a specified precision value, and wherein each regular node is configured to store a single significant digit value rather than a range of values; receive a numerical data value; truncate the numerical data value according to the specified precision value to generate a truncated value; store portions of the truncated value in the tree data structure, wherein to store the portions of the truncated value in the tree data structure comprises to: store an exponent of the truncated value in the exponential root node; and store each significant digit of the truncated value in a respective regular node based on significance level, wherein a first regular node positioned beneath the exponential root node stores a first significant digit of the truncated value, and wherein a second regular node positioned beneath the first regular node stores a second significant digit having less significance than the first significant digit; compute statistical information for each node storing a portion of the truncated value, wherein to compute the statistical information comprises to automatically scan the tree data structure; and provide the statistical information in response to a query.
14 . The system of claim 13 , wherein to compute the statistical information, the processor is configured to execute instructions stored in the memory to: compute a count of data values inserted at each node; and compute a sum of the data values stored at the each node.
15 . A non-transitory computer readable medium storing instructions operable to cause a processor to perform operations comprising: generating a tree data structure comprising an exponential root node and a plurality of regular nodes arranged in nested levels, wherein a number of the nested levels corresponds to a specified precision value, and wherein each regular node is configured to store a single significant digit value rather than a range of values; receiving a numerical data value; truncating the numerical data value according to the specified precision value to generate a truncated value; storing portions of the truncated value in the tree data structure by: storing an exponent of the truncated value in the exponential root node; and storing each significant digit of the truncated value in a respective regular node based on significance level, wherein a first regular node positioned beneath the exponential root node stores a first significant digit of the truncated value, and wherein a second regular node positioned beneath the first regular node stores a second significant digit having less significance than the first significant digit; computing statistical information for each node storing a portion of the truncated value, wherein computing the statistical information comprises automatically scanning the tree data structure; and providing the statistical information in response to a query.
16 . The non-transitory computer readable medium of claim 15 , the operations further comprising: receiving a second numerical data value; truncating the second numerical data value according to the specified precision value to generate a truncated second numerical data value; and updating the statistical information for nodes storing portions of the truncated second numerical data value.
17 . The non-transitory computer readable medium of claim 15 , the operations further comprising: merging the tree data structure with a second tree data structure by: identifying overlapping nodes between the tree data structure and the second tree data structure; and combining the statistical information for the overlapping nodes.
18 . The non-transitory computer readable medium of claim 15 , wherein computing the statistical information comprises: computing a count of data values inserted at each node; and computing a sum of the data values stored at the each node.
19 . The non-transitory computer readable medium of claim 15 , wherein generating the tree data structure comprises: initializing the plurality of regular nodes as empty nodes.
20 . The non-transitory computer readable medium of claim 15 , wherein the specified precision value represents a number of significant figures to be stored for numerical data values.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S) This Utility Patent Application is a continuation of U.S. patent application Ser. No. 18/497,242 filed on Oct. 30, 2023, which is a continuation of U.S. patent application Ser. No. 17/536,901 filed on Nov. 29, 2021, which is a continuation of U.S. patent application Ser. No. 16/777,708 filed on Jan. 30, 2020, the entire disclosures of which are hereby incorporated by reference. TECHNICAL FIELD The present invention relates generally to the field of data processing, and more particularly, but not exclusively to, database and file management or data structures. BACKGROUND Obtaining exact answers to basic queries on streaming data and/or massive datasets (e.g., petabytes and larger) consumes large amounts of compute resources. In addition, a query on a massive dataset (“Big Data”) can require an amount of time that becomes unacceptable for analysis. Stochastic stream algorithms have been developed to address the challenges of querying streaming data and/or massive datasets for cases in which approximate answers are acceptable for visualizations, metrics and statistics. These algorithms process a massive dataset in a single pass, and compute small summaries of the dataset. A histogram uses collected data, such as small summaries, to create metrics, statistics, visualizations and other analytical information for massive datasets. Typically, High Dynamic Range (HDR) histogram based algorithms have been used to record and analyze sampled data value counts of streaming data and/or massive datasets in latency and performance sensitive applications. The HDR histogram can be used across a configurable integer value range with configurable value precision within the range of values stored in an array. Value precision may be expressed as the number of significant digits in the recorded value, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level. Although the HDR Histogram is designed for recoding histograms of value measurements, large amounts of streaming data and/or massive datasets stored in arrays can become difficult for the HDR histogram to evaluate in real time. BRIEF DESCRIPTION OF THE DRAWINGS Non-limiting and non-exhaustive embodiments of the present innovations are described with reference to the following drawings. In the drawings, like reference numerals refer to like parts throughout the various figures unless otherwise specified. For a better understanding of the described innovations, reference will be made to the following Detailed Description of Various Embodiments, which is to be read in association with the accompanying drawings, wherein: FIG. 1 illustrates a system environment in which various embodiments may be implemented; FIG. 2 shows a schematic embodiment of a client computer; FIG. 3 illustrates a schematic embodiment of a network computer; FIG. 4 shows a logical architecture of a system for using a tree data structure for histogram based query applications; FIGS. 5A-5C illustrates a logical architecture for a histogram tree; and FIG. 6 shows a flow chart of logical operations to populate and query a histogram tree in accordance with the various embodiments of the present invention. DETAILED DESCRIPTION OF THE INVENTION Various embodiments now will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific exemplary embodiments by which the invention may be practiced. The embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the embodiments to those skilled in the art. Among other things, the various embodiments may be methods, systems, media or devices. Accordingly, the various embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense. Throughout the specification and claims, the following terms take the meanings explicitly associated herein, unless the context clearly dictates otherwise. The phrase “in one embodiment” as used herein does not necessarily refer to the same embodiment, though it may. Furthermore, the phrase “in another embodiment” as used herein does not necessarily refer to a different embodiment, although it may. Thus, as described below, various embodiments may be readily combined, without departing from the scope or spirit of the invention. In addition, as used herein, the term “or” is an inclusive “or” operator, and is equivalent to the term “and/or,” unless the context clearly dictates otherwise. The term “based on” is not exclusive and allows for being based on additio