CN-116126864-B - Index construction method, data query method and related equipment
Abstract
The application discloses an index construction method, a data query method and related equipment, and relates to the technical field of computers, wherein the method comprises the steps of obtaining a time line composed of labels of time sequence data, and distributing time line identifications for the time line, wherein the labels comprise label keys and label values, and the time line is composed of label values of different label keys; and constructing an index structure for retrieving the timeline based on the tag value, the timeline identifier and the identifier set, wherein the index structure comprises a first index layer and a second index layer so as to create a first mapping relation between the tag value and the identifier set through the first index layer and a second mapping relation between the timeline identifier and the timeline through the second index layer. The application solves the problem that the index cannot efficiently support the multi-dimensional retrieval of the time line in the prior art.
Inventors
- HU JIANHONG
- SHEN CHUNHUI
- ZHANG WEI
Assignees
- 阿里云计算有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20230118
Claims (14)
- 1. An index construction method, comprising: Acquiring a time line composed of labels of time sequence data, and distributing a time line identifier for the time line, wherein the labels comprise label keys and label values, and the time line is composed of the label values of different label keys; Determining a set of the timeline identifications corresponding to the tag values to obtain an identification set of the tag values; Constructing an index structure for retrieving the timeline based on the tag value, the timeline identifier and the identifier set, wherein the index structure comprises a first index layer and a second index layer, so as to create a first mapping relation between the tag value and the identifier set through the first index layer, and create a second mapping relation between the timeline identifier and the timeline through the second index layer; the first index layer comprises an inverted index, an inverted list of the inverted index stores an identification set of each tag value, a bitmap array is adopted to store the time line identification in the inverted list, the bitmap array is provided with a plurality of storage bits, the value of each storage bit is zero or one, and when the time line identification is stored, the value of each storage bit, which is consistent with the value of the time line identification, in order in the bitmap array is set to be one according to the value of the corresponding storage bit.
- 2. The method of claim 1, wherein creating, by the first index layer, a first mapping relationship between the tag value and the set of identifications comprises: and constructing a corresponding inverted list for the tag value in the inverted index, and storing the timeline identifier in the inverted list, wherein the value of the inverted list is the identifier set.
- 3. The method of claim 1, wherein the second index layer comprises: A forward index; wherein creating, by the second index layer, a second mapping relationship between the timeline identification and the timeline comprises: The second mapping relationship between the timeline identification and the timeline is created in the forward index.
- 4. The method of claim 2, wherein prior to constructing the corresponding inverted table for the tag value in the inverted index, the method further comprises: And judging whether the inverted index has the inverted list corresponding to the tag value, if so, storing the time line identification in the inverted list.
- 5. The method of claim 2, wherein prior to constructing the corresponding inverted table for the tag value in the inverted index, the method further comprises: Judging whether the inverted index has the inverted list corresponding to the tag value, if not, constructing the inverted list corresponding to the tag value in the inverted index, and storing the timeline identification in the inverted list.
- 6. The method of claim 2, wherein the step of configuring a timeline identification that uniquely identifies the timeline comprises: And configuring the timeline identification which uniquely identifies the timeline by adopting integer data.
- 7. A data query method, wherein the method is implemented based on an index structure obtained by constructing the index construction method according to any one of claims 1 to 6, the method comprising: Receiving label statistical information which is counted in advance, wherein the label statistical information counts the cardinalities of different label keys with time sequence data labels, and the cardinalities are the number of label values corresponding to the label keys; Receiving a given query condition, wherein the query condition is an array for querying a time line, the array is composed of different tag values, the query sequence of the tag values is determined based on the base, and the greater the base corresponding to the tag values is, the earlier the query sequence is; and sequentially reading the identification sets in a first index layer of an index structure according to the query sequence based on the tag values in the query conditions, judging whether a first identification set read based on the last tag value exists or is an empty set before the next tag value is adopted to read the identification sets in the first index layer in the reading process, and if the first identification set does not exist or is the empty set, determining that the time line meeting the query conditions is not met and stopping the query.
- 8. The method of claim 7, wherein after determining whether a first set of identifications read based on a last one of the tag values is present or empty before reading the set of identifications using a next one of the tag values into the first index layer, the method further comprises: step one, if the first identification set exists and is not an empty set, executing the next step; Judging whether the query condition has the next tag value or not, if so, reading an identification set from the first index layer based on the next tag value; and thirdly, judging whether a second identification set read based on the next tag value exists or is an empty set, if the second identification set does not exist or is an empty set, determining that the time line meeting the query condition is not met, and stopping querying.
- 9. The method of claim 8, wherein after determining whether the second set of identifications read based on the next tag value is present or empty, the method further comprises: if the second identification set exists and is not an empty set, calculating and obtaining an intersection between the first identification set and the second identification set; And acquiring a preset identification number threshold value, judging whether the number of the time line identifications in the intersection is smaller than or equal to the identification number threshold value, and if so, reading a time line meeting the query condition from a second index layer of the index structure based on the time line identifications in the intersection.
- 10. The method of claim 8, wherein after determining whether the second set of identifications read based on the next tag value is present or empty, the method further comprises: if the second identification set exists and is not an empty set, calculating and obtaining an intersection between the first identification set and the second identification set; And acquiring a preset mark number threshold value, judging whether the number of the time line marks in the intersection is smaller than or equal to the mark number threshold value, and if not, turning back to the step two.
- 11. An index building apparatus, comprising: The configuration module is used for acquiring a time line composed of labels of time sequence data and distributing a time line identifier for the time line, wherein the labels comprise label keys and label values, and the time line is composed of the label values of different label keys; A determining module, configured to determine a set of the timeline identifiers corresponding to the tag values, so as to obtain an identifier set of the tag values; The system comprises a construction module, a storage module and a bitmap array, wherein the construction module is used for constructing an index structure for retrieving the time line based on the tag value, the time line identifier and the identifier set, the index structure comprises a first index layer and a second index layer, a first mapping relation between the tag value and the identifier set is created through the first index layer, a second mapping relation between the time line identifier and the time line is created through the second index layer, the first index layer comprises an inverted index, an inverted table of the inverted index stores the identifier set of each tag value, the time line identifier is stored in the inverted list by adopting the bitmap array, the bitmap array is provided with a plurality of storage bits, the value of the storage bits is zero or one, and when the time line identifier is stored, the value of the storage bits, of which the order of bits in the bitmap array is consistent with the value, is set to one according to the value of the time line identifier.
- 12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps of any of claims 7 to 10 when the computer program is executed.
- 13. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the method steps of any of claims 7 to 10.
- 14. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method steps of any of claims 7 to 10.
Description
Index construction method, data query method and related equipment Technical Field The present application relates to the field of computer technologies, and in particular, to an index construction method, a data query method, and related devices. Background This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. It is not admitted to be prior art by inclusion of this description in this section. With the advent of the internet of things and the 5G era, time series scene data such as the internet of things, application monitoring, and the industrial internet have been exploded, and these time series data are usually stored using a time series database, and generally, one time series data is described by elements such as a Metric, tags, a Timestamp, and measurement value Fields. Where the Metric indicates where the data is stored, the tag Tags indicates who the data is generated from, the Timestamp indicates when the data is generated, and the measurement Fields indicate what the content of the data is. The tag is a feature describing a data source, generally does not change with time, for example, the sensor device includes tag information such as an Id of the device, a Region where the device is located, and the like, and an index is automatically built in the database for the tag, so that multidimensional search query is supported according to the tag, and the tag is composed of a tag key TagKey and a tag value TagValue, which are both reference data types String. After the time series data is stored in the time series database, a time line set meeting the condition is generally required to be searched according to different tag values so as to perform downsampling, aggregation and the like according to the granularity of the time line. To support retrieval according to tag values, it is common practice in the industry to construct inverted indexes for timelines to support the ability of the timeline to retrieve according to tag values in multiple dimensions. In general, the number of tag values of different tag keys is different, the number of tag values under some tag keys is smaller, and almost all time lines of given time sequence data contain the tag value, that is, the tag value distinction degree is not very large, which generally results in that the inverted list of the tag corresponding to the tag value in the inverted index is very large, and the IO overhead (that is, the percentage of input and output traffic) is larger when the inverted list of the tag is read from a disk, so that the retrieval performance is lower. The prior time sequence database in the industry has the following two defects that 1, the index cannot efficiently support multi-dimensional retrieval of the time line, and the inverted list directly stores the original value of the time line, so that the storage cost is high. 2. The method only provides simple inverted index implementation, does not further optimize IO overhead when retrieving the inverted index according to the characteristic information of the time line, and is difficult to improve the query efficiency of the multi-dimensional retrieval of the time line. Disclosure of Invention The embodiment of the application provides an index construction method, a data query method and related equipment, which at least solve the problem that in the prior art, an index cannot efficiently support multi-dimensional retrieval of a time line. According to an aspect of the present application, there is also provided an index construction method including: Acquiring a time line composed of labels of time sequence data, and distributing a time line identifier for the time line, wherein the labels comprise label keys and label values, and the time line is composed of the label values of different label keys; Determining a set of the timeline identifications corresponding to the tag values to obtain an identification set of the tag values; And constructing an index structure for retrieving the timeline based on the tag value, the timeline identifier and the identifier set, wherein the index structure comprises a first index layer and a second index layer, so that a first mapping relation between the tag value and the identifier set is created through the first index layer, and a second mapping relation between the timeline identifier and the timeline is created through the second index layer. In some of these embodiments, the first index layer comprises: An inverted index; Wherein creating, by the first index layer, a first mapping relationship between the tag value and the identification set includes: and constructing a corresponding inverted list for the tag value in the inverted index, and storing the timeline identifier in the inverted list, wherein the value of the inverted list is the identifier set. In some of these embodiments, the second index layer comprises: A forward index; wherein creating, by the se