Search

US-12619538-B2 - Method for data processing, electronic device, and storage medium

US12619538B2US 12619538 B2US12619538 B2US 12619538B2US-12619538-B2

Abstract

Data processing method and apparatus, an electronic device, and a storage medium are disclosed, which is in the fields of artificial intelligence, such as distributed storage and cloud computing. The method includes: determining a priority of each placement group in a cache pool respectively, and dividing placement groups with the same priority into a same waiting queue; constructing a target queue which is initially empty, and in response to determining that a supplementary trigger condition is met, determining placement groups to be retrieved based on the principle that a placement group in a waiting queue with higher priority is retrieved first, retrieving the placement groups to be retrieved from the corresponding waiting queue and adding the placement groups to be retrieved to the target queue; and in response to determining that the target queue is not empty, iteratively traversing each placement group in the target queue, wherein when traversing each placement group, the placement group is used as a target placement group respectively, and the number of writable objects is determined as a first quantity, and the first quantity of objects retrieved from the target placement group is written to a backend pool.

Inventors

  • Mingyuan LIANG

Assignees

  • BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Dates

Publication Date
20260505
Application Date
20241203
Priority Date
20231212

Claims (20)

  1. 1 . A method for data processing, comprising: determining a priority of each placement group in a cache pool respectively, and dividing placement groups with the same priority into a same waiting queue; constructing a target queue which is initially empty, and in response to determining that a supplementary trigger condition is met, determining placement groups to be retrieved based on a principle that a placement group in a waiting queue with higher priority is retrieved first, retrieving the determined placement groups from its corresponding waiting queue and adding the retrieved placement groups to the target queue; and in response to determining that the target queue is not empty, iteratively traversing each placement group in the target queue, wherein when traversing each placement group, the placement group is used as a target placement group respectively, and a first quantity of writable objects is determined, and the first quantity of writable objects retrieved from the target placement group is written to a backend pool.
  2. 2 . The method of claim 1 , wherein: the determining the priority of each placement group in the cache pool respectively comprises: in response to determining that an amount of data in the cache pool is greater than a first threshold, determining that the cache pool meets a write trigger condition, and performing the following processing for each placement group in the cache pool respectively: comparing the amount of data in the placement group with a reference data amount corresponding to the placement group, the reference data amount being a data amount threshold determined based on the first threshold, and determining the priority of the placement group based on the comparison result, wherein the larger the ratio of the amount of data in the placement group to the reference data amount, the higher the priority of the placement group.
  3. 3 . The method of claim 1 , wherein: the response to determining that the supplementary trigger condition is met comprises: in response to determining that there are no unprocessed placement groups in the target queue, determining that the supplementary trigger condition is met, wherein a processed placement group is the one when all objects in the placement group have been retrieved, and the processed placement group is deleted from the target queue; the determining the placement group to be retrieved comprises: using placement groups in the waiting queue with a highest priority from waiting queues where the placement groups have not been retrieved as the placement groups to be retrieved.
  4. 4 . The method of claim 1 , wherein: the response to determining that the supplementary trigger condition is met comprises: in response to determining that a number of unprocessed placement groups in the target queue is less than a second threshold, determining that the supplementary trigger condition is met, wherein the completion of the processing comprises: all objects have been retrieved, and the processed placement group is deleted from the target queue; the determining the placement groups to be retrieved comprises: in response to determining that it is a first time the supplementary trigger condition is met, retrieving from the waiting queue by selecting a highest priority placement group, and in response to determining that it is not the first time the supplementary trigger condition is met, determining a second quantity of placement groups to be retrieved, and using the second quantity of placement groups determined from the unretrieved placement groups in each waiting queue based on the principle as the placement groups to be retrieved.
  5. 5 . The method of claim 4 , wherein: the determining the second quantity of placement groups to be retrieved comprises: obtaining a difference between the second threshold and the number of unprocessed placement groups in the target queue, and using the difference as the second quantity.
  6. 6 . The method of claim 1 , wherein: the retrieving the first quantity of writable objects from the target placement group comprises: calling a list thread, using the list thread to retrieve the first quantity of writable objects from the target placement group, and adding the retrieved objects as objects to be written to a write queue; the writing to the backend pool comprises: in response to determining that the write queue is not empty, writing each object to be processed in the write queue to the backend pool in the order of a time each object was added to the write queue, from first to last.
  7. 7 . The method of claim 6 , wherein the iteratively traversing each placement group in the target queue comprises: monitoring a number of unwritten objects to be processed in the write queue, and in response to determining that the number of unwritten objects to be processed is less than or equal to a third threshold, traversing the next placement group, wherein the third threshold is less than a fourth threshold, and the fourth threshold is a maximum number of objects to be processed that the write queue is allowed to include.
  8. 8 . The method of claim 7 , wherein the determining the first quantity of writable objects, comprises: obtaining a difference between the fourth threshold and the number of unwritten objects to be processed, and using the difference as the first quantity.
  9. 9 . The method of claim 6 , wherein: the using the list thread to retrieve the first quantity of writable objects from the target placement group comprises: using the list thread to retrieve M objects from the target placement group that are unretrieved, wherein the objects in the target placement group are arranged in a predetermined order, M is a positive integer greater than or equal to the first quantity, and the retrieved M objects meet the following condition: after filtering out objects from the M objects with a time difference between a write time and a current time that is less than a fifth threshold, a number of remaining objects equals the first quantity; further comprising: in response to determining that the M objects cannot be retrieved, retrieving all unretrieved objects from the target placement group, filtering out objects with time difference between the write time and the current time that is less than the fifth threshold, and adding the remaining objects as objects to be written to the write queue.
  10. 10 . The method of claim 9 , further comprising: after using the list thread to retrieve objects from the target placement group, controlling the list thread to enter a sleep state, and releasing a lock corresponding to the target placement group occupied by the list thread when retrieving objects from the target placement group.
  11. 11 . An electronic device, comprising: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform a method for data processing, wherein the method for data processing comprises: determining a priority of each placement group in a cache pool respectively, and dividing placement groups with the same priority into a same waiting queue; constructing a target queue which is initially empty, and in response to determining that a supplementary trigger condition is met, determining placement groups to be retrieved based on a principle that a placement group in a waiting queue with higher priority is retrieved first, retrieving the determined placement groups from its corresponding waiting queue and adding the retrieved placement groups to the target queue; and in response to determining that the target queue is not empty, iteratively traversing each placement group in the target queue, wherein when traversing each placement group, the placement group is used as a target placement group respectively, and a first quantity of writable objects is determined, and the first quantity of writable objects retrieved from the target placement group is written to a backend pool.
  12. 12 . The electronic device of claim 11 , wherein: the determining the priority of each placement group in the cache pool respectively comprises: in response to determining that an amount of data in the cache pool is greater than a first threshold, determining that the cache pool meets a write trigger condition, and performing the following processing for each placement group in the cache pool respectively: comparing the amount of data in the placement group with a reference data amount corresponding to the placement group, the reference data amount being a data amount threshold determined based on the first threshold, and determining the priority of the placement group based on the comparison result, wherein the larger the ratio of the amount of data in the placement group to the reference data amount, the higher the priority of the placement group.
  13. 13 . The electronic device of claim 11 , wherein: the response to determining that the supplementary trigger condition is met comprises: in response to determining that there are no unprocessed placement groups in the target queue, determining that the supplementary trigger condition is met, wherein a processed placement group is the one when all objects in the placement group have been retrieved, and the processed placement group is deleted from the target queue; the determining the placement group to be retrieved comprises: using placement groups in the waiting queue with a highest priority from waiting queues where the placement groups have not been retrieved as the placement groups to be retrieved.
  14. 14 . The electronic device of claim 11 , wherein: the response to determining that the supplementary trigger condition is met comprises: in response to determining that a number of unprocessed placement groups in the target queue is less than a second threshold, determining that the supplementary trigger condition is met, wherein the completion of the processing comprises: all objects have been retrieved, and the processed placement group is deleted from the target queue; the determining the placement groups to be retrieved comprises: in response to determining that it is a first time the supplementary trigger condition is met, retrieving from the waiting queue by selecting a highest priority placement group, and in response to determining that it is not the first time the supplementary trigger condition is met, determining a second quantity of placement groups to be retrieved, and using the second quantity of placement groups determined from the unretrieved placement groups in each waiting queue based on the principle as the placement groups to be retrieved.
  15. 15 . The electronic device of claim 14 , wherein: the determining the second quantity of placement groups to be retrieved comprises: obtaining a difference between the second threshold and the number of unprocessed placement groups in the target queue, and uses the difference as the second quantity.
  16. 16 . The electronic device of claim 11 , wherein; the retrieving the first quantity of writable objects from the target placement group comprises: calling a list thread, use the list thread to retrieve the first quantity of writable objects from the target placement group, and adding the retrieved objects as objects to be written to a write queue; the writing to the backend pool comprises: in response to determining that the write queue is not empty, writing each object to be processed in the write queue to the backend pool in the order of a time each object was added to the write queue, from first to last.
  17. 17 . The electronic device of claim 16 , wherein the iteratively traversing each placement group in the target queue comprises: monitoring a number of unwritten objects to be processed in the write queue, and in response to determining that the number of unwritten objects to be processed is less than or equal to a third threshold, traversing the next placement group, wherein the third threshold is less than a fourth threshold, and the fourth threshold is a maximum number of objects to be processed that the write queue is allowed to include.
  18. 18 . The electronic device of claim 17 , wherein the determining the first quantity of writable objects comprises: obtaining a difference between the fourth threshold and the number of unwritten objects to be processed, and using the difference as the first quantity.
  19. 19 . The electronic device of claim 16 , wherein: the using the list thread to retrieve the first quantity of writable objects from the target placement group comprises: using the list thread to retrieve M objects the target placement group that are unretrieved, wherein the objects in the target placement group are arranged in a predetermined order, M is a positive integer greater than or equal to the first quantity, and the retrieved M objects meet the following condition: after filtering out objects from the M objects with a time difference between a write time and a current time that is less than a fifth threshold, a number of remaining objects equals the first quantity; further comprising: in response to determining that the M objects cannot be retrieved, retrieving all unretrieved objects from the target placement group, filtering out objects with a time difference between the write time and the current time that is less than the fifth threshold, and adding the remaining objects as objects to be written to the write queue.
  20. 20 . A non-transitory computer readable storage medium with computer instructions stored thereon, wherein the computer instructions are used for causing a method for data processing, wherein the method for data processing comprises: determining a priority of each placement group in a cache pool respectively, and dividing placement groups with the same priority into a same waiting queue; constructing a target queue which is initially empty, and in response to determining that a supplementary trigger condition is met, determining placement groups to be retrieved based on a principle that a placement group in a waiting queue with higher priority is retrieved first, retrieving the determined placement groups from its corresponding waiting queue and adding the retrieved placement groups to the target queue; and in response to determining that the target queue is not empty, iteratively traversing each placement group in the target queue, wherein when traversing each placement group, the placement group is used as a target placement group respectively, and a first quantity of writable objects is determined, and the first quantity of writable objects retrieved from the target placement group is written to a backend pool.

Description

The present application claims the priority of Chinese Patent Application No. 202311703935.1, filed on Dec. 12, 2023, with the title of “Data Processing Method and Apparatus, Electronic Device, and Storage Medium”. The disclosure of the above application is incorporated herein by reference in its entirety. FIELD OF THE DISCLOSURE The present disclosure relates to the field of artificial intelligence, in particular, to data processing method and apparatus, an electronic device, and a storage medium in the fields of distributed storage and cloud computing. BACKGROUND OF THE DISCLOSURE Existing distributed storage systems, such as high-performance distributed storage (Ceph) systems, typically design layered functions for different hardware medium. For example, frequently accessed data are stored in a cache pool, which usually uses Solid State Disk (SSD) as the storage medium. Infrequently accessed data are written to a backend pool, which typically uses regular Serial Advanced Technology Attachment (SATA) hard drives as the storage medium. SUMMARY OF THE DISCLOSURE The present disclosure provides a data processing method and apparatus, an electronic device, and a storage medium. A method for data processing, including: determining a priority of each placement group in a cache pool respectively, and dividing placement groups with the same priority into a same waiting queue;constructing a target queue which is initially empty, and in response to determining that a supplementary trigger condition is met, determining placement groups to be retrieved based on the principle that a placement group in a waiting queue with higher priority is retrieved first, retrieving the placement groups to be retrieved from the corresponding waiting queue and adding the placement groups to be retrieved to the target queue; andin response to determining that the target queue is not empty, iteratively traversing each placement group in the target queue, wherein when traversing each placement group, the placement group is used as a target placement group respectively, and the number of writable objects is determined as a first quantity, and the first quantity of objects retrieved from the target placement group is written to a backend pool. An electronic device, including: at least one processor; anda memory communicatively connected with the at least one processor;wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform a method for data processing, wherein the method for data processing includes:determining a priority of each placement group in a cache pool respectively, and dividing placement groups with the same priority into a same waiting queue;constructing a target queue which is initially empty, and in response to determining that a supplementary trigger condition is met, determining placement groups to be retrieved based on the principle that a placement group in a waiting queue with higher priority is retrieved first, retrieving the placement groups to be retrieved from the corresponding waiting queue and adding the placement groups to be retrieved to the target queue; andin response to determining that the target queue is not empty, iteratively traversing each placement group in the target queue, wherein when traversing each placement group, the placement group is used as a target placement group respectively, and the number of writable objects is determined as a first quantity, and the first quantity of objects retrieved from the target placement group is written to a backend pool. A non-transitory computer readable storage medium with computer instructions stored thereon, wherein the computer instructions are used for causing a method for data processing, wherein the method for data processing comprises: determining a priority of each placement group in a cache pool respectively, and dividing placement groups with the same priority into a same waiting queue;constructing a target queue which is initially empty, and in response to determining that a supplementary trigger condition is met, determining placement groups to be retrieved based on a principle that a placement group in a waiting queue with higher priority is retrieved first, retrieving the placement groups to be retrieved from the corresponding waiting queue and adding the placement groups to be retrieved to the target queue; andin response to determining that the target queue is not empty, iteratively traversing each placement group in the target queue, wherein when traversing each placement group, the placement group is used as a target placement group respectively, and the number of writable objects is determined as a first quantity, and the first quantity of objects retrieved from the target placement group is written to a backend pool. It should be understood that the content described in this section is not intended to identi