CN-122019527-A - Data management method and management device, storage medium, and electronic apparatus
Abstract
The application discloses a data management method, a management device, a storage medium and electronic equipment, wherein the method comprises the steps of establishing a bloom filter for a target data set in an initial state, wherein the bloom filter comprises a hash function with a first preset number and a bit array with the first preset number; and under the condition that the bloom filter reaches the capacity limit, switching the bloom filter to an inactive state and newly creating the bloom filter in the active state in response to an insertion request of the elements in the target data set, wherein the newly created bloom filter in the active state comprises a hash function with a second preset number and a bit array with the second preset number, the second preset number is larger than the first preset number, and mapping the elements to be inserted to the newly created bloom filter in the active state based on the unique ID of the elements to be inserted. The method can improve the data query efficiency, reduce the probability of false marking of the data and reduce the false alarm rate.
Inventors
- WANG ZHU
- WU XINGXING
- YU QINGYU
Assignees
- 三一绿能(株洲)电力有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251202
Claims (10)
- 1. A method of data management, the method comprising: In an initial state, establishing a bloom filter for a target data set, wherein the bloom filter comprises a hash function with a first preset number and a bit array with the first preset number; Responding to an insertion request of elements in the target data set, under the condition that the bloom filter reaches capacity limitation, transferring the bloom filter to an inactive state, and newly creating the bloom filter in an active state, wherein the insertion request carries a unique ID of the element to be inserted, the newly created bloom filter in the active state comprises a hash function of a second preset number and a bit array of the second preset number, and the second preset number is larger than the first preset number; Mapping the element to be inserted to the newly-built bloom filter in an active state based on the unique ID of the element to be inserted.
- 2. The data management method according to claim 1, wherein the number of bits in each bit array is the same, and a hash function maps the element to be inserted into the corresponding bit array based on the unique ID of the element to be inserted.
- 3. The data management method according to claim 1, wherein after mapping the element to be inserted to the newly created bloom filter in an active state, the method further comprises: responding to a query request, and acquiring a unique ID of an element to be queried carried in the query request; Based on the unique ID of the element to be queried, sequentially querying the bloom filter in the inactive state for the corresponding number of bits from the longest to the shortest duration of the inactive state, and determining whether the element to be queried exists; under the condition that the elements to be queried do not exist in all the bloom filters in the inactive state, querying bits of the corresponding number of the newly-built bloom filters in the active state based on the unique ID of the elements to be queried; And under the condition that the newly-built bloom filter in an active state does not contain the element to be queried, determining that all bloom filters do not contain the element to be queried.
- 4. A data management method according to claim 3, wherein the corresponding number of bits is determined based on the first preset number of hash functions, the sequence number corresponding to the current bloom filter, and the number of hash function increases of the newly-built bloom filter in an active state.
- 5. A method of data management according to claim 3, wherein the method further comprises: Determining all hash function search bits of the elements to be queried based on the unique IDs of the elements to be queried, wherein the all numbers are determined based on the hash functions of the first preset number, the total number of currently existing bloom filters and the number of hash function increase of newly-built bloom filters in an active state; and determining corresponding bit of each bloom filter when querying the element to be queried based on the hash function lookup bit.
- 6. A data management method according to claim 3, wherein said determining whether said element to be queried is present comprises: according to the unique ID of the element to be queried, calculating to obtain a hash value corresponding to the element to be queried; determining the numerical value of the bit array in each bloom filter according to the hash value corresponding to the element to be queried; when the values of bits of a bit array in a bloom filter are 1, determining that the element to be queried exists in the bloom filter; And when the numerical value of the bit is not equal to 1, determining that the element to be queried does not exist in the bloom filter.
- 7. The method of data management according to claim 6, wherein the method further comprises: Verifying the actual existence of the element to be queried in an auxiliary data structure, wherein the auxiliary data structure is a hash table or a database, and the auxiliary data structure is used for storing unique IDs of elements in the target data set so as to verify the authenticity of a query result; determining that the element to be queried exists in the bloom filter under the condition that the unique ID of the element to be inserted exists in the auxiliary data structure; in the event that the unique ID of the element to be inserted does not exist in the auxiliary data structure, determining that the element to be queried does not exist in the bloom filter.
- 8. A computer-readable storage medium, characterized in that a program is stored thereon, which program, when being executed by a processor, implements the data management method according to any of claims 1-7.
- 9. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the data management method according to any one of claims 1-7 when executing the program.
- 10. A data management apparatus, the apparatus comprising: The first establishing module is used for establishing a bloom filter for the target data set in an initial state, wherein the bloom filter comprises a hash function with a first preset number and a bit array with the first preset number; The second establishing module is used for responding to an insertion request of the element in the target data set, switching the bloom filter into an inactive state under the condition that the bloom filter reaches a capacity limit, and newly establishing the bloom filter in the active state, wherein the insertion request carries a unique ID of the element to be inserted, the newly established bloom filter in the active state comprises a hash function of a second preset number and a bit array of the second preset number, and the second preset number is larger than the first preset number; And the mapping module is used for mapping the element to be inserted to the newly-built bloom filter in an active state based on the unique ID of the element to be inserted.
Description
Data management method and management device, storage medium, and electronic apparatus Technical Field The present application relates to the field of data storage technologies, and in particular, to a data management method, a computer readable storage medium, an electronic device, and a data management apparatus. Background The existing control data management method includes a storage mode based on a database and a storage mode based on modular arithmetic. The database-based storage mode can solve a part of problems, but has the problems of higher cost, low searching efficiency, difficult control of balance and the like when the data size is huge and rapidly increases. The storage mode based on standard modulo operation can solve the problem of balance, and has higher searching efficiency but poorer expansibility. There are also control data representation methods currently employing bloom filters, i.e., bloom filters are suitable for fast, compact storage structure-based element representation and lookup. However, as the amount of data in the bloom filter increases, more bits in the bit array may be marked as 1, which increases the likelihood of false alarm rate (i.e., the rate at which the bloom filter erroneously reports the presence of an element that is not actually in the collection when querying the element), thereby reducing the accuracy of the query. Disclosure of Invention The present application aims to solve at least one of the technical problems in the related art to some extent. Therefore, a first object of the present application is to provide a data management method, in an initial state, a bloom filter is established for a target data set, where the bloom filter includes a hash function of a first preset number and a bit array of the first preset number, and in response to an insertion request of an element in the target data set, the bloom filter is transferred to an inactive state and newly constructed in an active state when it is determined that the bloom filter reaches a capacity limit, where the insertion request carries a unique ID of the element to be inserted, the newly constructed bloom filter in an active state includes a hash function of a second preset number and a bit array of the second preset number, and the second preset number is greater than the first preset number, and maps the element to be inserted to the newly constructed bloom filter in an active state based on the unique ID of the element to be inserted. A second object of the present application is to propose a computer readable storage medium. A third object of the present application is to propose an electronic device. A fourth object of the present application is to provide a data management apparatus. In order to achieve the above purpose, an embodiment of a first aspect of the present application provides a data management method, which includes establishing a bloom filter for a target data set in an initial state, where the bloom filter includes a first preset number of hash functions and a bit array of the first preset number, responding to an insertion request of an element in the target data set, transferring the bloom filter to an inactive state and creating a bloom filter in an active state when determining that the bloom filter reaches a capacity limit, where the insertion request carries a unique ID of the element to be inserted, the created bloom filter in the active state includes a second preset number of hash functions and the second preset number of bit arrays, and the second preset number is greater than the first preset number, and mapping the element to be inserted to the newly created bloom filter in the active state based on the unique ID of the element to be inserted. According to the data management method of the embodiment of the application, a bloom filter is established for a target data set in an initial state, wherein the bloom filter comprises a hash function with a first preset number and a bit array with the first preset number, the bloom filter is transferred to an inactive state and newly established in the active state under the condition that the bloom filter is determined to reach a capacity limit in response to an insertion request of elements in the target data set, the insertion request carries a unique ID of the elements to be inserted, the newly established bloom filter in the active state comprises a hash function with a second preset number and a bit array with the second preset number, the second preset number is larger than the first preset number, and the elements to be inserted are mapped to the newly established bloom filter in the active state based on the unique ID of the elements to be inserted. Therefore, the method can improve the data query efficiency and reduce the probability of false marking of the data, thereby reducing the false alarm rate. In addition, the data management method according to the above embodiment of the present application may further