US-12619576-B2 - Time-series data storage and processing database system
Abstract
A database system is described that includes components for storing time-series data and executing custom, user-defined computational expressions in substantially real-time such that the results can be provided to a user device for display in an interactive user interface. For example, the database system may process stored time-series data in response to requests from a user device. The request may include a start time, an end time, a period, and/or a computational expression. The database system may retrieve the time-series data identified by the computational expression and, for each period, perform the arithmetic operation(s) identified by the computational expression on data values corresponding to times within the start time and the end time. Once all new data values have been generated, the database system may transmit the new data values to the user device for display in the interactive user interface.
Inventors
- David Tobin
- Pawel Adamowicz
- Steven Fackler
- Sri Krishna Vempati
- Wilson Wong
- Orcun Simsek
Assignees
- Palantir Technologies Inc.
Dates
- Publication Date
- 20260505
- Application Date
- 20200521
Claims (16)
- 1 . A system comprising: a computer readable storage medium storing program instructions; and a computer hardware processor in communication with the computer readable storage medium, wherein the program instructions, when executed by the computer hardware processor, cause the computer hardware processor to: process data associated with a first time-series stored in a log, the first time-series comprising a subset of time-series data from a data source; merge, into a first data file, time-series data having overlapping timestamp values in a third data file and a fourth data file, wherein the merging comprises including a single data value in place of multiple data values associated with overlapping timestamp values in the third and fourth data files; determine that an incremental backup is triggered; write the first time-series, including the merged time-series data, to a disk in the first data file in association with the first time-series; record via a separate log that the first data file is written to the disk; perform the incremental backup of a second data file associated with the first time-series and received prior to the incremental backup being triggered; determine that the first data file is a merged data file formed from merging time-series data of the third data file and the fourth data file; and in response to determining that: (1) the first data file is a merged data file, and (2) incremental backup of the first data file, which includes merged data of the third data file and the fourth data file, is complete, delete the third data file and the fourth data file, including the overlapping timestamp values in each of the third and fourth data files.
- 2 . The system of claim 1 , wherein the log is a global write ahead log.
- 3 . The system of claim 2 , wherein the program instructions, when executed by the computer hardware processor, further cause the computer hardware processor to write the first time-series to the disk after the global write ahead log is flushed.
- 4 . The system of claim 1 , wherein the log is a local write ahead log.
- 5 . The system of claim 4 , wherein the program instructions, when executed by the computer hardware processor, further cause the computer hardware processor to write the first time-series to the disk after the local write ahead log is flushed.
- 6 . The system of claim 1 , wherein the log is an in-memory buffer.
- 7 . The system of claim 1 , wherein the program instructions, when executed by the computer hardware processor, further cause the computer hardware processor to determine that the incremental backup is triggered based on a global state of a time-series data store.
- 8 . The system of claim 1 , wherein the separate log is configured to track changes that occur after the incremental backup is triggered.
- 9 . The system of claim 1 , wherein the incremental backup does not include new data received after the backup is triggered.
- 10 . A computer-implemented method comprising: processing data associated with a first time-series stored in a log, the first time-series comprising a subset of time-series data from a data source; merging, into a first data file, time-series data having overlapping timestamp values in a third data file and a fourth data file, wherein the merging comprises including a single data value in place of multiple data values associated with overlapping timestamp values in the third and fourth data files; determining that an incremental backup is triggered; writing the first time-series, including the merged time-series data, to a disk in the first data file in association with the first time-series; recording via a separate log that the first data file is written to the disk; performing the incremental backup of a second data file associated with the first time-series and received prior to the incremental backup being triggered; determining that the first data file is a merged data file formed from merging time-series data of the third data file and the fourth data file; and in response to determining that: (1) the first data file is a merged data file; and (2) incremental backup of the first data file, which includes merged data of the third data file and the fourth data file, is complete, deleting the third data file and the fourth data file, including the overlapping timestamp values in each of the third and fourth data files.
- 11 . The computer-implemented method of claim 10 , wherein the log is a global write ahead log.
- 12 . The computer-implemented method of claim 11 , wherein writing the first time-series to the disk further comprises writing the data to the disk after the global write ahead log is flushed.
- 13 . The computer-implemented method of claim 10 , wherein the log is a local write ahead log.
- 14 . The computer-implemented method of claim 13 , wherein writing the first time-series data to the disk further is performed after the local write ahead log is flushed.
- 15 . The computer-implemented method of claim 10 , wherein the log is an in-memory buffer.
- 16 . A non-transitory computer-readable medium comprising one or more program instructions recorded thereon, the instructions configured for execution by a computer hardware processor in communication with the computer-readable medium in order to cause a system to: process data associated with a first time-series stored in a log, the first time-series comprising a subset of time-series data from a data source; merge, into a first data file, time-series data having overlapping timestamp values in a third data file and a fourth data file, wherein the merging comprises including a single data value in place of multiple data values associated with overlapping timestamp values in the third and fourth data files; determine that an incremental backup is triggered; write the first time-series, including the merged time-series data, to a disk in the first data file in association with the first time-series; record via a separate log that the first data file is written to the disk; perform the incremental backup of a second data file associated with the first time-series and received prior to the incremental backup being triggered; determine that the first data file is a merged data file formed from compaction of the third data file and the fourth data file; and in response to determining that: (1) the first data file is a merged data file; and (2) incremental backup of the first data file, which includes merged data of the third data file and the fourth data file, is complete, delete the third data file and the fourth data file, including the overlapping timestamp values in each of the third and fourth data files.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 15/693,029, entitled “TIME-SERIES DATA STORAGE AND PROCESSING DATABASE SYSTEM” and filed on Aug. 31, 2017, and soon to issue as U.S. Pat. No. 10,664,444, which is a continuation of U.S. patent application Ser. No. 15/226,675, entitled “TIME-SERIES DATA STORAGE AND PROCESSING DATABASE SYSTEM” and filed on Aug. 2, 2016, and issued as U.S. Pat. No. 9,753,935, which are hereby incorporated by reference herein in their entireties. Any and all Applications, if any, for which a foreign or domestic priority claim is identified in the Application Data Sheet of the present application are hereby incorporated by reference herein in their entireties under 37 CFR 1.57. TECHNICAL FIELD The present disclosure relates to database systems that store and process data for display in an interactive user interface. BACKGROUND A database may store a large quantity of data. For example, a system may comprise a large number of sensors that each collect measurements at regular intervals, and the measurements may be stored in the database. The measurement data can be supplemented with other data, such as information regarding events that occurred while the system was operational, and the supplemental data can also be stored in the database. In some cases, a user may attempt to analyze a portion of the stored data. For example, the user may attempt to analyze a portion of the stored data that is associated with a specific time period. In response, the user's device may retrieve the appropriate data from the database. However, as the quantity of data stored in the database increases over time, retrieving the appropriate data from the database and performing the analysis can become complicated and time consuming. Thus, the user may experience noticeable delay in the display of the desired data. SUMMARY The systems, methods, and devices described herein each have several aspects, no single one of which is solely responsible for its desirable attributes. Without limiting the scope of this disclosure, several non-limiting features will now be discussed briefly. Disclosed herein is a database system that includes components for storing time-series data and executing custom, user-defined computational expressions in substantially real-time such that the results can be provided to a user device for display in an interactive user interface. For example, the database system may include memory storage, disk storage, and/or one or more processors. Data received from a data source may include value and timestamp pairs and, once written to disk, may be immutable. Thus, the database system may not overwrite a portion of the data or append additional data to the written data once the data is written to disk. Because the data is immutable, all data written to disk can be memory mapped given that the location of the data will not change. The database system may process stored time-series data in response to requests from a user device. For example, the user may request to view time-series data by manipulating an interactive user interface. The request, received by the database system from the user device (possibly via a server), may include a start time, an end time, a period, and/or a computational expression. The start time and end time may correspond with a range of timestamp values for which associated time-series data values should be retrieved. The period may indicate, when analyzed in conjunction with the start time and end time, a number of data points requested by the user device for display in the interactive user interface. The computational expression may indicate an arithmetic (and/or other type of) operation, if any, that the user wishes to perform on one or more sets of time-series data. Example arithmetic operations include a sum, a difference, a product, a ratio, a zScore, a square root, and/or the like. Once the database system receives the request, the database system may begin retrieving the appropriate time-series data and performing the indicated arithmetic (and/or other types of) operations via the one or more processors. Depending on the type of indicated operation(s) to be performed, the one or more processors may perform pointwise operations or sliding window operations. As described above, because the data files may be memory mapped, the one or more processors can access the data files from memory, rather than from disk, to perform the indicated operations. The database system described herein may then achieve better performance when generating the new data values as compared with conventional databases. Once all new data values have been generated, the database system may transmit the new data values to the user device (for example, via the server) for display in the interactive user interface. One aspect of the disclosure provides a database configured to receive and process requests associated with time-series da