Search

CN-121996816-A - Task optimization method, device, equipment, storage medium and program product

CN121996816ACN 121996816 ACN121996816 ACN 121996816ACN-121996816-A

Abstract

The application discloses a task optimization method, a device, equipment, a storage medium and a program product, and relates to the technical field of big data analysis, wherein the task optimization method comprises the steps of obtaining at least two calculation tasks and task information of each calculation task in the at least two calculation tasks; constructing a directed relation graph according to at least two computing tasks and task information of each computing task in the at least two computing tasks, generating at least one task execution batch according to a directed acyclic graph corresponding to the directed relation graph, wherein the at least one task execution batch comprises at least one computing task corresponding to a directed edge, and carrying out task merging on the computing tasks in the target task execution batch for the target task execution batch in the at least one task execution batch to obtain a merging result. The task optimization method can optimize the calculation tasks with the same data sources in the same execution batch, and further effectively improves the efficiency of data processing.

Inventors

  • GUO YULING
  • XU WANLIN
  • LIU WEN

Assignees

  • 中移(杭州)信息技术有限公司
  • 中国移动通信集团有限公司

Dates

Publication Date
20260508
Application Date
20241105

Claims (14)

  1. 1.A method of task optimization, comprising: acquiring at least two computing tasks and task information of each computing task in the at least two computing tasks, wherein the task information comprises a first data table and a second data table; Constructing a directed relation graph according to the at least two computing tasks and task information of each computing task in the at least two computing tasks, wherein the directed relation graph comprises a graph node set and a directed edge set, one graph node in the graph node set represents the first data table or the second data table, and one directed edge in the directed edge set represents one computing task in the at least two computing tasks; generating at least one task execution batch according to the directed acyclic graph corresponding to the directed relational graph, wherein the at least one task execution batch comprises at least one calculation task corresponding to a directed edge, the path lengths from the directed edge corresponding to all calculation tasks in each task execution batch to a source node are equal, and the source node is a graph node with zero degree of incidence and zero degree of egress in the graph node set; And carrying out task merging on the computing tasks in the target task execution batch to obtain a merging result, wherein the target task execution batch comprises at least two computing tasks.
  2. 2. The method according to claim 1, wherein generating at least one task execution batch from the directed acyclic graph corresponding to the directed relational graph comprises: screening graph nodes with input degree equal to zero and output degree greater than zero in the directed acyclic graph, and constructing a source node set; For each source node in the set of source nodes, performing the steps of constructing at least one task execution batch corresponding to the source node: Traversing a plurality of target directed edges associated with the source node, and for each target directed edge in the plurality of target directed edges, calculating the number of directed edges between the source node and a graph node pointed by the target directed edge to obtain the path length of the target directed edge; and constructing at least one task execution batch according to the target directed edges and the path lengths of each directed edge in the target directed edges, wherein the path lengths of the directed edges corresponding to the calculation tasks in each task execution batch are the same.
  3. 3. The method of claim 1, wherein the task merging, for a target task execution lot of the at least one task execution lot, the computing task of the target task execution lot to obtain a merged result, comprises: and under the condition that at least two first target tasks exist in the target task execution batch, task merging is carried out on the at least two first target tasks to obtain a merging result, wherein the directed edge corresponding to each first target task in the at least two first target tasks points to a second dependent data table from a first dependent data table, the first dependent data table is any one of the first data table or the second data table, and the second dependent data table is any one of the first data table or the second data table.
  4. 4. The method of claim 1, wherein the task merging, for a target task execution lot of the at least one task execution lot, the computing task of the target task execution lot to obtain a merged result, comprises: And under the condition that at least two second target tasks exist in the target task execution batch, task merging is carried out on the at least two second target tasks to obtain a merging result, and the directed edge corresponding to each second target task in the at least two second target tasks points to the first data table or the second data table from a third dependent data table, wherein the third dependent data table is any one of the first data table or the second data table.
  5. 5. The method according to claim 4, wherein, in the case that at least two second target directed edges exist in the target task execution lot, task merging is performed on the computing tasks corresponding to the at least two second target directed edges, so as to obtain a merging result, including: And under the condition that the number of the at least two second target tasks in the target task execution batch is larger than the preset number, carrying out task merging on the at least two second target tasks to obtain a merging result.
  6. 6. The method according to claim 4 or 5, wherein for a target task execution lot of the at least one task execution lot, task merging is performed on computing tasks in the target task execution lot to obtain a merging result, including: When at least two second target tasks and at least two first target tasks exist in the target task execution batch and one first target task exists in the at least two second target tasks, the at least two first target tasks are combined to obtain a first subtask; task merging is carried out on the at least two second target tasks to obtain second subtasks; And carrying out task combination on the first subtask and the second subtask to obtain a combination result.
  7. 7. The method of claim 1, wherein the task merging, for a target task execution lot of the at least one task execution lot, the computing task of the target task execution lot to obtain a merged result, comprises: and carrying out operator combination on operators in the calculation tasks in the target task execution batch to obtain a combination result for the target task execution batch in the at least one task execution batch.
  8. 8. The method of claim 1, wherein constructing a directed graph from the at least two computing tasks and task information for each of the at least two computing tasks comprises: Determining the dependency relationship between each of the at least two computing tasks according to the at least two computing tasks and the task information of each of the at least two computing tasks; traversing task information of each of the at least two computing tasks, identifying all participating data tables, and generating a graph node set, wherein each graph node in the graph node set represents the first data table or the second data table; And constructing a directed relation graph according to the dependency relation between each computing task in at least two computing tasks and the graph node set, wherein the directed relation graph comprises a directed edge set, one directed edge in the directed edge set represents one computing task in the at least two computing tasks, and the directed edge points to the other graph node from one graph node.
  9. 9. The method of claim 2, wherein before generating at least one task execution lot from the directed acyclic graph corresponding to the directed relational graph, the method further comprises: performing graph search on the directed graph by using a depth-first search algorithm, and detecting loops in the directed graph to obtain a detection result; Under the condition that a loop exists in the directed relation graph according to the detection result, performing loop removal processing on the loop according to a preset loop removal strategy to obtain a directed acyclic graph corresponding to the directed relation graph, wherein the preset loop removal strategy comprises at least one of the following steps: Deleting the directed edges in the loop, carrying out node splitting processing on the graph nodes in the loop, and modifying the dependency relationship of the calculation tasks existing in the loop.
  10. 10. The method according to any one of claims 1 to 5, wherein the first data table is used for storing live traffic data, and the second data table is used for storing result data obtained in the event that the execution of the computing task ends, the result data being determined based on the live traffic data.
  11. 11. A task optimization device, the device comprising: the system comprises an acquisition module, a calculation module and a calculation module, wherein the acquisition module is used for acquiring at least two calculation tasks and task information of each calculation task in the at least two calculation tasks, and the task information comprises a first data table and a second data table; The building module is used for building a directed relation graph according to the at least two computing tasks and task information of each computing task in the at least two computing tasks, the directed relation graph comprises a graph node set and a directed edge set, one graph node in the graph node set represents the first data table or the second data table, and one directed edge in the directed edge set represents one computing task in the at least two computing tasks; The first generation module is used for generating at least one task execution batch according to the directed acyclic graph corresponding to the directed relational graph, the at least one task execution batch comprises at least one calculation task corresponding to a directed edge, the path lengths from the directed edge corresponding to all calculation tasks in each task execution batch to a source node are equal, and the source node is a graph node with zero degree of incidence and zero degree of output in the graph node set; the merging module is used for carrying out task merging on the calculation tasks in the target task execution batch to obtain a merging result, wherein the target task execution batch comprises at least two calculation tasks.
  12. 12. An electronic device comprising a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements a task optimization method as claimed in any one of claims 1-10.
  13. 13. A computer readable storage medium, having stored thereon computer program instructions, which when executed by a processor, implement a task optimization method according to any one of claims 1-10.
  14. 14. A computer program product comprising a computer program which, when executed by a processor, implements the task optimization method of any one of claims 1-10.

Description

Task optimization method, device, equipment, storage medium and program product Technical Field The invention belongs to the technical field of big data analysis, and particularly relates to a task optimization method, device, equipment, storage medium and program product. Background Virtual Reality (VR) live broadcast is a combination of Virtual Reality and live broadcast, and VR live broadcast captures video by using a panoramic camera, and then transmits video content to viewers in real time through a network. VR live broadcast can generate large amounts of real-time data including user viewing behavior, interactive data, etc. The data center can collect, process, integrate and convert the data so as to better utilize the data generated by VR live broadcast, optimize the live broadcast strategy and promote the live broadcast effect. The data center needs to solve the problems of data integration, data storage, data processing, data service and the like. The optimization data calculation task is an important link in the data center processing process, and is directly related to the efficiency and accuracy of data processing. In the related art, an optimized data computing task generally divides a plurality of computing tasks with a dependency relationship into the same execution lot for serial execution, so as to optimize dispatch logic in the same execution lot, ensure that the computing tasks in the same execution lot can only be dispatched once, and reduce computing resource consumption. However, with the foregoing optimization method, the same computing tasks of the data sources in the same execution batch still need to be executed independently, which may cause unnecessary computing overhead and reduce the efficiency of data processing. Disclosure of Invention The embodiment of the application provides a task optimization method, a device, equipment, a storage medium and a program product, which can optimize the same calculation tasks of data sources in the same execution batch, thereby effectively improving the efficiency of data processing. In a first aspect, an embodiment of the present application provides a task optimization method, where the task optimization method includes: Acquiring at least two computing tasks and task information of each computing task in the at least two computing tasks, wherein the task information comprises a first data table and a second data table; Constructing a directed relation graph according to at least two computing tasks and task information of each computing task in the at least two computing tasks, wherein the directed relation graph comprises a graph node set and a directed edge set, one graph node in the graph node set represents a first data table or a second data table, and one directed edge in the directed edge set represents one computing task in the at least two computing tasks; Generating at least one task execution batch according to the directed acyclic graph corresponding to the directed relation graph, wherein the at least one task execution batch comprises at least one calculation task corresponding to a directed edge, the path lengths from the directed edge corresponding to all calculation tasks in each task execution batch to a source node are equal, and the source node is a graph node with an input degree equal to zero and an output degree greater than zero in a graph node set; And carrying out task merging on the computing tasks in the target task execution batch to obtain a merging result, wherein the target task execution batch comprises at least two computing tasks. In some implementations of the embodiments of the present application, generating at least one task execution batch according to the directed acyclic graph corresponding to the directed relational graph includes: screening graph nodes with input degree equal to zero and output degree greater than zero in the directed acyclic graph, and constructing a source node set; For each source node in the set of source nodes, performing the steps of constructing at least one task execution batch corresponding to the source node: traversing a plurality of target directed edges associated with the source node, and for each target directed edge in the plurality of target directed edges, calculating the number of directed edges between the source node and the graph nodes pointed by the target directed edge to obtain the path length of the target directed edge; and constructing at least one task execution batch according to the path lengths of the target directed edges and the directed edges, wherein the path lengths of the directed edges corresponding to the calculation tasks in each task execution batch are the same. In some implementations of the embodiments of the present application, for a target task execution lot of at least one task execution lot, task merging is performed on computing tasks in the target task execution lot to obtain a merging result, including: And under the condition