CN-117851537-B - Index construction method of time sequence data storage engine
Abstract
The invention provides an index construction method of a time sequence data storage engine, and belongs to the technical field of database storage. The method comprises the steps of pre-screening a data block according to document frequency of a label key and occurrence frequency of a label value, performing feature extraction on a pre-screened set through historical access frequency of the label key to obtain data features, performing machine learning to further screen the data features to obtain target sets of index label groups of each time line, performing target label extraction on the target sets according to a plurality of different indexes in the index label groups to obtain a plurality of group label sets, placing the same group label set of the index label into corresponding time lines to obtain a plurality of time line sets, assigning a unique group ID to each time line set, establishing inverted indexes of label key value pairs and group ID mapping, and establishing leading indexes of the target label and inverted index mapping. The invention can improve the writing efficiency and the index construction efficiency of time sequence data.
Inventors
- LIU XIAOGUANG
- XU ZIYUE
- WANG GANG
- Huang Sutong
- FEI DI
- LIU XINYU
- YU WENQING
- Wei Zijing
- LIU SHAOZHI
Assignees
- 南开大学
Dates
- Publication Date
- 20260512
- Application Date
- 20240118
Claims (8)
- 1. An index construction method of a time series data storage engine, comprising: S1, pre-screening a data block to be stored according to the document frequency of a label key and the occurrence frequency of a label value to obtain a pre-screening set; S2, extracting features of the pre-screening set through the historical access frequency of the label key to obtain data features; S3, performing machine learning on the data characteristics to obtain a screening function, and screening the pre-screening set through the screening function to obtain a target set, wherein the target set at least comprises index tag groups of each time line; S4, extracting target labels from the target set according to a plurality of different indexes in the index label group to obtain a plurality of group label sets; S5, putting the same group label set of the index labels into time lines corresponding to the index labels to obtain a plurality of time line sets; And S6, assigning a unique group ID to each time line set, establishing an inverted index mapped by the tag key value pair and the group ID, and establishing a leading index mapped by the target tag and the inverted index so as to complete the index construction of the time sequence data storage engine.
- 2. The method of claim 1, wherein the data features in step S2 include document frequency, tab key base rank ratio, tab key frequency rank, and tab key frequency rank ratio.
- 3. The method of claim 1, wherein the machine learning scheme for the data features in step S3 is an AdaBoost iterative algorithm.
- 4. The method according to claim 1, wherein in step S3, each index in the index tag group is named as a tag key, and the index value is named as a tag value.
- 5. The method of claim 1, wherein each tag key pair in the inverted index in step S6 has a corresponding inverted chain, and the inverted chain includes an ascending group ID group.
- 6. The method of claim 1, wherein the pre-indexing in step S6 is implemented by algebraic reconstruction and dictionary data structure.
- 7. The method according to claim 6, wherein in the pre-index in step S6, the algebraic reconstruction method is used to map a tag key to a dictionary data structure address formed by a corresponding tag value set.
- 8. The method of claim 6, wherein the index of step S6 is the pre-index, and the dictionary data structure stores a mapping of tag values and corresponding inverted chain offsets.
Description
Index construction method of time sequence data storage engine Technical Field The invention relates to the technical field of database storage, in particular to an index construction method of a time sequence data storage engine. Background With the development of the internet of things technology, the number and application range of internet of things equipment are dramatically increased. In order to ensure high availability and robustness of the internet of things equipment and internet services, a need for more precise and comprehensive monitoring of real-time operation states has arisen. As a storage engine for the above-mentioned monitoring data, a time-series database has recently received a wide attention in academia and industry in this context. Typical timing data is generally composed of two parts, time line data and time period data. The time point data is generally composed of a 64-bit integer type of time stamp, and an index value of a double precision floating point type (IEEE 754 double). The presentation of the timeline data is complex, typically consisting of a monitoring index string (metric), and a series of tag key pair strings (TAGKV PAIRS), commonly referred to as a timeline (serieskey). The current method for storing and constructing indexes of time series data is to construct a two-index-layer index structure for retrieving a time line based on a label value, a time line identifier and an identifier set, save a mapping relation between the created label value and the identifier set, and create a second mapping relation between the time line identifier and the time line. However, this method cannot adapt to the expansion of the time-series database, and when the time-series database expands, the index construction amount increases rapidly, thereby affecting the writing efficiency of the time-series data and the index construction efficiency. Disclosure of Invention The present invention is directed to solving at least one of the technical problems existing in the related art. To this end, the present invention provides a method of index construction for a time-series data storage engine. The invention provides an index construction method of a time sequence data storage engine, which comprises the following steps: S1, pre-screening a data block to be stored according to the document frequency of a label key and the occurrence frequency of a label value to obtain a pre-screening set; S2, extracting features of the pre-screening set through the historical access frequency of the label key to obtain data features; S3, performing machine learning on the data characteristics to obtain a screening function, and screening the pre-screening set through the screening function to obtain a target set, wherein the target set at least comprises index tag groups of each time line; S4, extracting target labels from the target set according to a plurality of different indexes in the index label group to obtain a plurality of group label sets; S5, putting the same group label set of the index labels into time lines corresponding to the index labels to obtain a plurality of time line sets; And S6, assigning a unique group ID to each time line set, establishing an inverted index mapped by the tag key value pair and the group ID, and establishing a leading index mapped by the target tag and the inverted index so as to complete the index construction of the time sequence data storage engine. According to the index construction method of the time sequence data storage engine, the data features in the step S2 comprise document frequency, label key base number ranking ratio, label key frequency ranking and label key frequency ranking ratio. According to the index construction method of the time sequence data storage engine provided by the invention, the machine learning scheme for the data features in the step S3 is an AdaBoost iterative algorithm. According to the index construction method of the time series data storage engine provided by the invention, in step S3, each index in the index label group is named as a label key by using an index name, and the index value is named as a label value. According to the index construction method of the time sequence data storage engine provided by the invention, each tag key value pair in the inverted index in the step S6 has a corresponding inverted chain, and the inverted chain comprises an ascending group ID group. According to the index construction method of the time sequence data storage engine provided by the invention, the prepositive index in the step S6 is realized through an algebraic reconstruction method and a dictionary data structure. According to the index construction method of the time series data storage engine provided by the invention, in the pre-index in the step S6, the algebraic reconstruction method is used for mapping the label key to the dictionary data structure address formed by the corresponding label value set. According to the index