CN-121979480-A - First-in first-out data structure based on key value pair lookup and processing method
Abstract
The application discloses a first-in first-out data structure based on key value pair searching and a processing method thereof, which relate to data processing and data storage technologies and comprise a three-layer closed loop architecture of hash index, a ring array and metadata control, wherein the hash index layer is used for receiving an external key value operation request and mapping and positioning data positions in the ring array through index mapping, the ring array storage layer is used for physical storage and sequence management of data, the ring array of a continuous memory is adopted for realizing FIFO sequence management through a compact data unit and a head pointer and a tail pointer, the metadata control layer is used for maintaining a structure running state and providing parameter support for operation execution, and the metadata control layer is used for storing core state parameters through atomic variables and realizing lock-free synchronization through CAS operation. According to the application, by optimizing the storage structure, the precursor/subsequent pointer of the doubly linked list is canceled, and the compact storage mode is adopted, so that the memory utilization rate in a small data scene is improved.
Inventors
- YANG WEI
- ZHU XINGGUO
- JU XIAO
- HAN DONG
- QIN LIJIE
- AN FEI
- Zhu Jiangshuai
- Guo Shoubang
- YIN DAN
- Bian Daliang
- CAO YUAN
Assignees
- 中国电子科技集团有限公司电子科学研究院
Dates
- Publication Date
- 20260505
- Application Date
- 20251217
Claims (9)
- 1. A first-in first-out data structure based on key value pair lookup is characterized by comprising a three-layer closed-loop architecture of hash index, annular array and metadata control, wherein each layer is in closed-loop cooperation of index mapping-data storage-state management, The hash index layer is used for receiving an external key value operation request, positioning the data position in the annular array through index mapping, and adopting a hash abstract and index storage design; The annular array storage layer is used for physical storage and sequence management of data, adopts an annular array of continuous memory, and realizes FIFO sequence management through compact data units and head and tail pointers; The metadata control layer maintains the running state of the structure and provides parameter support for operation execution, and the metadata control layer adopts an atomic variable to store core state parameters and realizes lock-free synchronization through CAS operation.
- 2. The key-pair lookup-based first-in, first-out data structure of claim 1, wherein the hash index layer, each index entry includes a hash digest of a specified number of bits, a ring array index, and a status identifier, and the total length is no more than 64 bits, wherein the status identifier is used to mark the entry as 8 bits in validity.
- 3. The key-pair lookup-based first-in-first-out data structure of claim 2, wherein the hash index layer employs a hybrid of linear detection and collision chain, detects free locations in the order of index +1 in case of initial index location collision, and records the collision link by status identification.
- 4. The key-pair lookup-based first-in-first-out data structure of claim 1, wherein the structure parameters of the ring array storage layer are: the initial length is 2 n, the array adopts continuous memory application, each time the capacity expansion is 2 times of the original length, the effective data are migrated in batches through a memory copy function during the capacity expansion; Each unit of compact data units of the ring array storage layer comprises: key ID is consistent with the length of the hash index layer; The Value field adopts length prefix coding and comprises a1 byte length identifier and data content; And a timestamp for assisting in verifying the validity of the data.
- 5. The key pair lookup-based first-in, first-out data structure of claim 4, wherein the head and tail pointers of the ring array storage layer comprise a head pointer and a tail pointer, wherein, A Head pointer pointing to an earliest inserted data location; a Tail pointer pointing to the location to be inserted; under the condition that the array is not full, the Tail pointer moves according to Tail= (Tail+1)% Length, and the Length is the Length of the annular array; in the case of an array that is full, the new data overwrites the head pointer position and the head pointer moves synchronously.
- 6. The key-pair lookup-based FIFO data structure as recited in claim 5, wherein the core metadata of the metadata control layer comprises a ring array Length, a current data amount Count, a Head pointer position, a Tail pointer position, and a hash index layer load factor, and the core metadata is stored with the same Length as the hash index layer Length.
- 7. The key pair lookup-based FIFO data structure as recited in claim 6, wherein the Head/Tail pointer and Count values are updated by CAS operation atoms in the case of an insert/delete operation.
- 8. The first-in first-out data structure based on key-value pair lookup of claim 6, wherein when Count is greater than or equal to Length x β, triggering a capacity expansion mechanism, the capacity expansion process being performed in an independent thread, wherein β is a set proportionality threshold; The new continuous memory is applied for capacity expansion, and after effective data is copied, the array pointer is updated through CAS atoms.
- 9. A first-in first-out data processing method based on key-value pair lookup, implemented based on a first-in first-out data structure as claimed in any one of claims 1-8, comprising: Receiving an external key value operation request, and positioning the data position in the annular array through index mapping; Adopting a ring array of a continuous memory, and managing physical storage and sequence of data through compact data units and head and tail fingers; and the core state parameters are stored by adopting atomic variables, lock-free synchronization is realized through CAS operation, and the running state of the data structure is maintained.
Description
First-in first-out data structure based on key value pair lookup and processing method Technical Field The present application relates to the field of data processing and data storage technologies, and in particular, to a first-in first-out data structure based on key value pair lookup and a processing method thereof. Background Along with the rapid landing of the internet of things (IoT) and edge computing technology, a data processing scene sinks from a cloud to edge nodes (such as an edge gateway, an industrial sensor and an intelligent terminal), and the device has obvious scene characteristics of limited resources, most of memory ranging from a few MB to tens of MB and weak computing power, outstanding data characteristics, high-frequency generation of real-time data, centralized query on key value mapping, time-sequence elimination of expired data, severe low-delay requirements, and query response time controlled below millisecond level. For example, an industrial sensor can generate thousands of pieces of equipment status data (the key is an equipment ID/data number, and the value is a monitoring index) per second, and an edge gateway needs to quickly query the designated equipment data, and meanwhile, eliminates the earliest data according to a first-in first-out (FIFO) rule when the memory is insufficient, so as to avoid memory overflow. The well-known underlying technologies supporting this scene data processing are mainly two types of traditional data structures: FIFO queue, in which data is dequeued according to the queue order by adopting the linear storage mode of array or unidirectional linked list, and strict FIFO elimination is naturally realized. However, the search operation needs to traverse all elements, the time complexity is O (n), and when the data size reaches 1000 pieces, the search delay is usually more than 10ms, and the high-frequency query requirement cannot be met. The Hash Table (Hash Table) establishes direct mapping of keys and values based on Hash functions, and the complexity of searching, inserting and deleting operation time is O (1), so that key value inquiry can be responded quickly. However, the FIFO elimination cannot be realized due to the lack of the sequential management capability, the memory exhaustion is easily caused by the continuous writing of data, and the auxiliary structures such as a timestamp, an index table and the like are required to be additionally maintained, so that the complexity and the memory overhead of the system are increased. In the prior art, a hash table and doubly linked list hybrid structure (typically represented by Java LinkedHashMap) is a main scheme for considering "key value pair lookup" and "FIFO elimination", and specifically the scheme includes two major core components, which work cooperatively: And in the hash table, an array is taken as a bottom storage container, each array element (hash bucket) corresponds to a group of key value pairs with the same hash value, and the quick positioning of data and the searching operation of O (1) time complexity are supported through the mapping relation of key-linked list nodes. Each node comprises four parts, namely a Key (Key), a Value (Value), a precursor pointer (prev) and a subsequent pointer (next), wherein the precursor pointer/the subsequent pointer is used for maintaining the sequence relation among the nodes, so that the linked list is ordered according to the data insertion time, and a sequence basis is provided for FIFO elimination. Data insertion operation, namely calculating a hash value of a data key to be inserted, and determining a target hash bucket position of the hash value in a hash table; Checking whether key conflict exists in the hash bucket, if no conflict exists, directly creating a new doubly linked list node, and inserting the new doubly linked list node into the hash bucket; meanwhile, a new node is inserted as a tail node of the bidirectional linked list, a next pointer of an original tail node is updated to point to the new node, a prev pointer of the new node points to the original tail node, and the sequential updating of the linked list is completed; and recording the mapping relation of the key-new linked list node in the hash table, and ensuring that the node can be quickly positioned by the key later. Data searching operation, namely calculating a hash value of a key of data to be searched, and positioning the hash value to a target hash bucket in a hash table; traversing the nodes in the hash bucket (solving key conflict), and finding out the corresponding doubly linked list nodes through key matching; and directly returning the value stored by the node without traversing a doubly linked list, so as to realize the lookup of the O (1) time complexity. FIFO elimination operation, presetting a data storage upper limit (such as the maximum node number), and monitoring the node number of a doubly linked list in real time; when the number of nodes reaches