US-12621125-B2 - Method and apparatus for performing dynamic hashing to store key values in order
Abstract
A dynamic hashing apparatus includes a processor configured to obtain a target segment based on at least a part of a key value of an insertion request, compute a hash value by applying a remapping function corresponding to the obtained target segment to an input value based on the key value, and update the remapping function by adjusting a slope of the remapping function in response to the key value not being insertable into a target bucket selected based on at least a part of the computed hash value from among buckets included by the target segment.
Inventors
- Young-ri Choi
- Jin Yang
- Hee Jin YOON
- Gyeongchan Yun
- Sam H. Noh
Assignees
- UNIST (ULSAN NATIONAL INSTITUTE OF SCIENCE AND TECHNOLOGY)
Dates
- Publication Date
- 20260505
- Application Date
- 20230731
- Priority Date
- 20220804
Claims (20)
- 1 . A dynamic hashing method performed by a processor, the dynamic hashing method comprising: obtaining a target segment based on at least a part of a key value of an insertion request; computing a hash value by applying a remapping function corresponding to the obtained target segment to an input value based on the key value; and updating the remapping function by adjusting a slope of the remapping function in response to the key value not being insertable into a target bucket selected based on at least a part of the computed hash value from among buckets comprised by the target segment, wherein the computing the hash value of the insertion request comprises: computing a hash value by applying a partial remapping function determined based on at least one bit of the input value represented by a plurality of bits to the input value.
- 2 . The dynamic hashing method of claim 1 , further comprising: inserting the key value into a new target bucket corresponding to at least a part of a hash value recomputed by applying the updated remapping function to the input value.
- 3 . The dynamic hashing method of claim 1 , further comprising: inserting the key value to the target bucket in response to the key value being insertable to the target bucket selected based on at least a part of the computed hash value.
- 4 . The dynamic hashing method of claim 1 , wherein the obtaining the target segment comprises: selecting a target hash table from among a plurality of hash tables based on a predetermined number of most significant bits of the key value represented by a plurality of bits; selecting a target directory from among all the directories of the target hash table based on at least a part of a median value of the key value excluding bits used to obtain the target hash table; and obtaining the target segment connected to the selected target directory.
- 5 . The dynamic hashing method of claim 1 , wherein the computing the hash value of the insertion request comprises: excluding one or more bits required to identify the target segment of the key value of the insertion request represented by a plurality of bits from obtaining the input value.
- 6 . The dynamic hashing method of claim 1 , wherein the updating the remapping function comprises: increasing a slope of a target partial remapping function corresponding to the target bucket such that the number of key values that is less than or equal to a predetermined number is stored in each of a plurality of buckets of the target segment.
- 7 . The dynamic hashing method of claim 1 , wherein the updating the remapping function comprises: increasing a slope of a target partial remapping function corresponding to the target bucket and decreasing a slope of another partial remapping function while maintaining a range of the remapping function.
- 8 . The dynamic hashing method of claim 1 , wherein the updating the remapping function comprises: increasing a slope of a partial remapping function of a target section comprising the input value of a plurality of sections obtained by splitting a reference section in response to a utilization of the reference section comprising the input value being less than or equal to a utilization threshold.
- 9 . The dynamic hashing method of claim 1 , further comprising: omitting the update of the remapping function in response to a utilization of the target segment being greater than or equal to a utilization threshold.
- 10 . The dynamic hashing method of claim 1 , further comprising: determining whether the target segment is connected to a plurality of directories in response to a utilization of the target segment being greater than or equal to a utilization threshold.
- 11 . The dynamic hashing method of claim 10 , wherein the determining whether the target segment is connected to the plurality of directories comprises: determining based on whether the number of bits used to identify the target segment is less than the number of bits used to identify all the directories of the key value represented by a plurality of bits.
- 12 . The dynamic hashing method of claim 1 , further comprising: changing the target segment to new segments in response to a utilization of the target segment being greater than or equal to a utilization threshold and the target segment being connected to a plurality of directories; and connecting the plurality of directories having been connected to the target segment to at least one of the new segments.
- 13 . The dynamic hashing method of claim 1 , further comprising: setting a new remapping function corresponding to each of the new segments based on a remapping function corresponding to the target segment in response to connecting a plurality of directories having been connected to the target segment to at least one of the new segments.
- 14 . The dynamic hashing method of claim 1 , further comprising: determining whether the target segment is expandable based on the number of buckets comprised by the target segment in response to a utilization of the target segment being greater than or equal to a utilization threshold.
- 15 . The dynamic hashing method of claim 1 , further comprising: expanding the target segment by increasing a slope of the remapping function in response to the target segment being expandable; and using the greater number of bits of a hash value represented by a plurality of bits than before the target segment is expanded, to select the target bucket from among buckets of the target segment.
- 16 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform a method comprising: obtaining a target segment based on at least a part of a key value of an insertion request; computing a hash value by applying a remapping function corresponding to the obtained target segment to an input value based on the key value; and updating the remapping function by adjusting a slope of the remapping function in response to the key value not being insertable into a target bucket selected based on at least a part of the computed hash value from among buckets comprised by the target segment, wherein the computing the hash value of the insertion request comprises: computing a hash value by applying a partial remapping function determined based on at least one bit of the input value represented by a plurality of bits to the input value.
- 17 . A dynamic hashing apparatus comprising: a hardware processor configured to obtain a target segment based on at least a part of a key value of an insertion request, compute a hash value by applying a remapping function corresponding to the obtained target segment to an input value based on the key value, and update the remapping function by adjusting a slope of the remapping function in response to the key value not being insertable into a target bucket selected based on at least a part of the computed hash value from among buckets comprised by the target segment, wherein the hardware processor is further configured to compute a hash value by applying a partial remapping function determined based on at least one bit of the input value represented by a plurality of bits to the input value.
- 18 . The dynamic hashing apparatus of claim 17 , wherein: the hardware processor is further configured to insert the key value into a new target bucket corresponding to at least a part of a hash value recomputed by applying the updated remapping function to the input value.
- 19 . The dynamic hashing apparatus of claim 17 , wherein: the hardware processor is further configured to insert the key value to the target bucket in response to the key value being insertable to the target bucket selected based on at least a part of the computed hash value.
- 20 . The dynamic hashing apparatus of claim 17 , wherein; the hardware processor is further configured to exclude one or more bits required to identify the target segment of the key value of the insertion request represented by a plurality of bits from obtaining the input value.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S) This application claims the priority benefit of Korean Patent Application No. 10-2022-0097515 filed on Aug. 4, 2022, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference for all purposes. BACKGROUND 1. Field One or more embodiments relate to a dynamic hashing. 2. Description of Related Art Recent studies indicate that a key distribution of an actual dataset is diverse and complex. In other words, keys may be more skewed in a certain section than other sections in a whole section, or a degree of concentration may greatly vary depending on the sections. A typical index structure, supposing a uniform key distribution, may not efficiently support the search, insertion, and scanning of a dynamic dataset. Field research on indexing structure has recently proposed a learned index technique which approximates a cumulative distribution function (CDF) of a key distribution by using a machine learning algorithm, such as a neural network or a linear regression, and using the key distribution to create an index, unlike a typical B+ tree or hash table. This technique may outperform a typical indexing technique when a key distribution is uniform and does not greatly change over time. However, there may be the problem that the performance of an index significantly decreases because a CDF approximation is not readily available and becomes inaccurate with respect to a dynamic dataset, in which a key density changes over a whole key space, a key distribution changes over time, and an overhead to predict a CDF again dynamically according to a change increases. SUMMARY A dynamic hashing method according to an aspect may provide a high-performance index structure optimally supporting a dynamic dataset. However, technical aspects are not limited to the foregoing aspects, and there may be other technical aspects. According to an aspect, a dynamic hashing method performed by a processor includes obtaining a target segment based on at least a part of a key value of an insertion request; computing a hash value by applying a remapping function corresponding to the obtained target segment to an input value based on the key value; and updating the remapping function by adjusting a slope of the remapping function in response to the key value not being insertable into a target bucket selected based on at least a part of the computed hash value from among buckets included by the target segment. The dynamic hashing method may further include inserting the key value into a new target bucket corresponding to at least a part of a hash value recomputed by applying the updated remapping function to the input value. The dynamic hashing method may further include inserting the key value to the target bucket in response to the key value being insertable to the target bucket selected based on at least a part of the computed hash value. The obtaining the target segment may include selecting a target hash table from among a plurality of hash tables based on a predetermined number of most significant bits of the key value represented by a plurality of bits; selecting a target directory from among all the directories of the target hash table based on at least a part of a median value of the key value excluding bits used to obtain the target hash table; and obtaining the target segment connected to the selected target directory. The computing the hash value of the insertion request may include excluding one or more bits required to identify the target segment of the key value of the insertion request represented by a plurality of bits from obtaining the input value. The computing the hash value of the insertion request may include computing a hash value by applying a partial remapping function determined based on at least one bit of the input value represented by a plurality of bits to the input value. The updating the remapping function may include increasing a slope of a target partial remapping function corresponding to the target bucket such that the number of key values that is less than or equal to the predetermined number is stored in each of a plurality of buckets of the target segment. The updating the remapping function may include increasing a slope of a target partial remapping function corresponding to the target bucket and decreasing the slope of another partial remapping function while maintaining a range of the remapping function. The updating the remapping function may include increasing a slope of a partial remapping function of a target section including the input value of a plurality of sections obtained by splitting a reference section in response to a utilization of the reference section including the input value being less than or equal to a utilization threshold. The dynamic hashing method may further include omitting the update of the remapping function in response to a utilization of the target segment being greater than or equal to a ut