Search

CN-121980626-A - Data operation method, device, equipment and computer readable storage medium

CN121980626ACN 121980626 ACN121980626 ACN 121980626ACN-121980626-A

Abstract

The invention discloses a data operation method, a device, equipment and a computer readable storage medium, which are applied to the technical field of data processing and comprise the steps of determining an evicted data item obtained when a data insertion operation is executed based on a cuckoo hash algorithm, writing the evicted data item into an idle position of an additional storage module, and executing reinsertion operation on the evicted data item when the data operation idle period is determined. Compared to the fact that the data item currently being evicted does not exist in any position of the hash table, so that the evicted data item is in a 'free' state, the method and the device can prevent the loss of the evicted data item by determining the data item to be evicted, thereby writing the evicted data item into the idle position of the additional storage module, and can subsequently re-execute the writing operation on the evicted data item.

Inventors

  • SHA MENG
  • SHI PENG
  • LIU QIHAO

Assignees

  • 山东云海国创云计算装备产业创新中心有限公司

Dates

Publication Date
20260505
Application Date
20260126

Claims (10)

  1. 1. A method of data manipulation comprising: determining the obtained data item which is evicted when the data insertion operation is executed based on the cuckoo hash algorithm; writing the evicted data item into an idle location of an additional storage module; and when the data operation idle period is determined to be in, performing reinsertion operation on the evicted data item.
  2. 2. The data manipulation method of claim 1, wherein prior to determining the resulting evicted data item when performing the data insertion operation based on the cuckoo hash algorithm, further comprising: acquiring a new data item and an operation identifier; performing parallel calculation on the new data item, generating a preset number of hash values, and determining a preset number of storage positions based on the hash values; Reading out keys of read-out data items in a preset number of hash tables based on the storage positions; comparing the key of the new data item with the key of the readout data item; when the operation identifier is a data insertion operation identifier, the corresponding flag bits of the data items are read according to the read preset number; if the identification bit is empty, inserting the new data item into the corresponding address of the hash table corresponding to the identification bit; And determining the evicted data item to execute an eviction process and a loop detection process if the identification bit is empty, wherein the eviction process is a process of randomly selecting one read data item as an eviction target, determining a position corresponding to the evicted data item, writing the new data item into the position, and the loop detection process is a process of rewriting the evicted data item based on the detected data operation idle period.
  3. 3. The data manipulation method of claim 2 wherein performing parallel computations on the new data item, generating a preset number of hash values, and determining a preset number of storage locations based on the hash values comprises: performing parallel calculation on the new data item to generate four hash values; and determining four storage positions by taking each hash value as an address, wherein the storage positions correspond to positions in the four hash tables.
  4. 4. The data manipulation method of claim 2, further comprising, after acquiring the new data item and the manipulation identification: when the operation mark is a searching operation, searching is carried out in a storage module and an additional storage module, wherein the storage module is a module comprising a preset number of hash functions, and the additional storage module is a module for storing the evicted data items; When searching is carried out in the storage module, the key corresponding to the new data item is compared with the key for reading out the data item; outputting the current read data item if the key of the new data item is the same as the key of the current read data item and the flag bit of the current read data item is valid; Outputting a blank data item if the key of the new data item and the key of the read data item are different and the flag bit of the read data item is invalid; Traversing the keys of the data items in the additional storage module, and comparing the keys of the data items in the additional storage module with the keys of the new data items; if the keys of the data items in the additional storage module are the same as the keys of the new data items, reading the data items in the additional storage module with the same keys; and outputting a blank data item if the key of the data item in the additional storage module is different from the key of the new data item.
  5. 5. The data manipulation method of claim 2, further comprising, after acquiring the new data item and the manipulation identification: When the operation identifier is a deleting operation, and when the key of the new data item is the same as the key of the current read data item and the flag bit of the current read data item is valid, the flag bit of the current read data item is set to be invalid and the current data item is deleted.
  6. 6. The data manipulation method according to any one of claims 1 to 5, wherein performing a reinsertion operation on the evicted data item when it is determined that it is in a data manipulation idle period comprises: acquiring data items of which targets are evicted from the additional storage according to a set sequence; Performing data insertion operation on the data item of which the target is evicted; when the target evicted data item passes a set number of evicted insert operations, a stop loop flag is triggered.
  7. 7. The data manipulation method of claim 1 wherein said additional storage module is implemented based on a distributed random access memory.
  8. 8. A data manipulation device, comprising: The evicted data item determining module is used for determining the evicted data item obtained when the data insertion operation is executed based on the cuckoo hash algorithm; A writing idle position module for writing the evicted data item into an idle position of the additional storage module; and the data reinsertion module is used for executing reinsertion operation on the evicted data item when the data operation idle period is determined to be in.
  9. 9. A data manipulation device, comprising: A memory for storing a computer program; a processor for executing the computer program to perform the steps of the data manipulation method according to any one of claims 1 to 7.
  10. 10. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the data manipulation method according to any one of claims 1 to 7.

Description

Data operation method, device, equipment and computer readable storage medium Technical Field The present invention relates to the field of data processing technologies, and in particular, to a data operation method, apparatus, device, and computer readable storage medium. Background Hash tables are a data structure that efficiently stores key-value pairs in buckets (sockets), and uses hash functions to enable fast access of key-value pairs through key-value mapping. Different keys may produce the same hash value, resulting in different key-value pairs that may be mapped to the same bucket, thus producing a hash collision. In the existing data storage mode based on hash, kicked data items may not be found before reinsertion, so that information is lost. It can be seen that how to prevent data loss during data operation is a technical problem that needs to be solved by those skilled in the art. Disclosure of Invention Accordingly, an object of the present invention is to provide a data manipulation method, apparatus, device and computer readable storage medium, which solve the technical problem of data loss in the prior art. In order to solve the above technical problems, the present invention provides a data operation method, including: determining the obtained data item which is evicted when the data insertion operation is executed based on the cuckoo hash algorithm; writing the evicted data item into an idle location of an additional storage module; and when the data operation idle period is determined to be in, performing reinsertion operation on the evicted data item. In one aspect, before determining that the data insertion operation is performed based on the cuckoo hash algorithm, the method further includes: acquiring a new data item and an operation identifier; performing parallel calculation on the new data item, generating a preset number of hash values, and determining a preset number of storage positions based on the hash values; Reading out keys of read-out data items in a preset number of hash tables based on the storage positions; comparing the key of the new data item with the key of the readout data item; when the operation identifier is a data insertion operation identifier, the corresponding flag bits of the data items are read according to the read preset number; if the identification bit is empty, inserting the new data item into the corresponding address of the hash table corresponding to the identification bit; And determining the evicted data item to execute an eviction process and a loop detection process if the identification bit is empty, wherein the eviction process is a process of randomly selecting one read data item as an eviction target, determining a position corresponding to the evicted data item, writing the new data item into the position, and the loop detection process is a process of rewriting the evicted data item based on the detected data operation idle period. In one aspect, performing parallel computation on the new data item, generating a preset number of hash values, and determining a preset number of storage locations based on the hash values, including: performing parallel calculation on the new data item to generate four hash values; and determining four storage positions by taking each hash value as an address, wherein the storage positions correspond to positions in the four hash tables. In one aspect, after acquiring the new data item and the operation identifier, the method further includes: when the operation mark is a searching operation, searching is carried out in a storage module and an additional storage module, wherein the storage module is a module comprising a preset number of hash functions, and the additional storage module is a module for storing the evicted data items; When searching is carried out in the storage module, the key corresponding to the new data item is compared with the key for reading out the data item; outputting the current read data item if the key of the new data item is the same as the key of the current read data item and the flag bit of the current read data item is valid; Outputting a blank data item if the key of the new data item and the key of the read data item are different and the flag bit of the read data item is invalid; Traversing the keys of the data items in the additional storage module, and comparing the keys of the data items in the additional storage module with the keys of the new data items; if the keys of the data items in the additional storage module are the same as the keys of the new data items, reading the data items in the additional storage module with the same keys; and outputting a blank data item if the key of the data item in the additional storage module is different from the key of the new data item. In one aspect, after acquiring the new data item and the operation identifier, the method further includes: When the operation identifier is a deleting operation, and when the key of the new da