CN-121998023-A - Link generation, reinforcement learning and data processing method, equipment and storage medium

CN121998023ACN 121998023 ACN121998023 ACN 121998023ACN-121998023-A

Abstract

The embodiment of the specification provides a link generation, reinforcement learning and data processing method, equipment and a storage medium. The method comprises the steps of obtaining a first task requirement of a first data processing task, matching a plurality of required first operators from a plurality of alternative operators according to the first task requirement, generating a first data processing link to be optimized based on the plurality of first operators, and optimizing the first data processing link to be optimized by utilizing a trained reinforcement learning model according to the first data processing link to be optimized, performance estimation of the first data processing link to be optimized and current running states of the plurality of alternative operators, wherein the first target data processing link is used for processing the first data processing task.

Inventors

Zhan Wanke
LIU XIUTING
CAI JIANSHENG

Assignees

蚂蚁区块链科技(上海)有限公司

Dates

Publication Date: 20260508
Application Date: 20251231

Claims (19)

1. A method of link generation, comprising: Acquiring a first task requirement of a first data processing task; According to the first task requirement, matching a plurality of required first operators from a plurality of alternative operators and generating a first data processing link to be optimized based on the plurality of first operators; optimizing the first data processing link to be optimized by using a trained reinforcement learning model according to the first data processing link to be optimized, the performance estimation of the first data processing link to be optimized and the current running states of the plurality of candidate operators, so as to obtain a first target data processing link; Wherein the first target data processing link is for processing the first data processing task.
2. The method of claim 1, wherein optimizing the first data processing link to be optimized to obtain a first target data processing link using a trained reinforcement learning model based on the first data processing link to be optimized, a performance estimate of the first data processing link to be optimized, and a current operating state of the plurality of candidate operators, comprises: Determining a first state vector according to the first data processing link to be optimized, the performance estimation of the first data processing link to be optimized and the current running states of the plurality of alternative operators; Inputting the state vector to the reinforcement learning model; Executing the optimization action output by the reinforcement learning model to optimize the first data processing link to be optimized; When the optimization process does not meet the end condition, taking the first data processing link to be optimized after the optimization as a new first data processing link to be optimized, and returning to execute the step of determining a first state vector according to the first data processing link to be optimized, the performance estimation of the first data processing link to be optimized and the current running states of the plurality of candidate operators; And when the optimization process meets the end condition, determining the first data processing link to be optimized after the optimization as a first target data processing link.
3. The method of claim 1, wherein the reinforcement learning process of the reinforcement learning model comprises a plurality of rounds, wherein a round comprises the steps of: When the round starts, a second data processing link to be optimized corresponding to a second data processing task is obtained, wherein the second data processing link to be optimized is determined based on a second task requirement of the second data processing task and the plurality of candidate operators; Determining an optimization action by utilizing a reinforcement learning model to be trained according to the second data processing link to be optimized, the performance estimation of the second data processing link to be optimized and the current running states of the plurality of alternative operators; executing the optimizing action to optimize the second data processing link to be optimized; Optimizing the reinforcement learning model to be trained according to the environmental rewards after the optimization actions; When the round does not meet the ending condition, taking the second data processing link to be optimized after the optimization as a new second data processing link to be optimized, and returning to execute the current running states of the second data processing link to be optimized, the plurality of alternative operators and the performance estimation of the second data processing link to be optimized, and determining the optimizing action by utilizing a reinforcement learning model to be trained; and ending the round when the round meets the ending condition.
4. A method according to claim 3, further comprising: and determining environmental rewards after the optimizing action according to the performance estimation of the new second data processing link to be optimized.
5. The method of claim 3, wherein the environmental rewards are a plurality of, the plurality of environmental rewards including at least two of rewards determined based on a run-time estimate of the new second data processing link to be optimized, rewards determined based on a resource consumption estimate of the new second data processing link to be optimized, rewards determined based on an output quality estimate of the new second data processing link to be optimized, rewards determined based on a task success rate estimate of the new second data processing link to be optimized.
6. The method of claim 5, wherein optimizing the reinforcement learning model to be trained based on the environmental rewards after the optimization action comprises: Acquiring weight configuration information; according to the weight configuration information, weighting and summing a plurality of environmental rewards to obtain a target reward; And optimizing the reinforcement learning model to be trained based on the target rewards.
7. The method according to any one of claims 2 to 6, wherein the end condition comprises that the current round has converged and/or that the number of actions performed in the current round has reached a preset action number threshold.
8. The method of any one of claims 2 to 6, wherein the action space corresponding to the reinforcement learning model includes one or more of adding operators, deleting operators, adjacent operator order adjustment and replacing operators.
9. The method according to any one of claims 1 to 6, further comprising: sampling a data source to be processed of the first data processing task to obtain a data subset; operating the first data processing link to be optimized to process the subset of data; and determining the performance estimation of the first data processing link to be optimized according to the observed performance index.
10. The method of any of claims 1 to 6, wherein matching a required plurality of first operators from a plurality of alternative operators and generating a first data processing link to be optimized based on the plurality of first operators according to the first task requirement comprises: And matching a plurality of first operators required in a plurality of alternative operators according to the first task requirement by using a heuristic algorithm, and generating a first data processing link to be optimized based on the plurality of first operators.
11. The method according to any one of claims 1 to 6, wherein optimizing the first data processing link to be optimized with a trained reinforcement learning model to obtain a first target data processing link according to the first data processing link to be optimized, a performance estimate of the first data processing link to be optimized, and a current operating state of the plurality of candidate operators, comprises: extracting features of the first data processing link to be optimized to obtain topological structure features of the first data processing link to be optimized; And optimizing the first data processing link to be optimized by utilizing a trained reinforcement learning model according to the topological structure characteristics, the current running states of the operators and the performance estimation of the first data processing link to be optimized, so as to obtain a first target data processing link.
12. A method of reinforcement learning, comprising: when the round starts, a second data processing link to be optimized corresponding to a second data processing task is obtained, wherein the second data processing link to be optimized is determined based on a second task requirement of the second data processing task and a plurality of alternative operators; Determining an optimization action by using a reinforcement learning model to be trained according to the second data processing link to be optimized, the current running states of the plurality of alternative operators and the performance estimation of the second data processing link to be optimized; Optimizing the second data processing link to be optimized based on the optimizing action to obtain a new second data processing link to be optimized; Optimizing the reinforcement learning model to be trained according to the environmental rewards after the optimization actions; When the round does not meet the ending condition, taking the second data processing link to be optimized after the optimization as a new second data processing link to be optimized, and returning to execute the current running states of the second data processing link to be optimized, the plurality of alternative operators and the performance estimation of the second data processing link to be optimized, and determining the optimizing action by utilizing a reinforcement learning model to be trained; and ending the round when the round meets the ending condition.
13. A method of data processing, comprising: acquiring a first data processing task, wherein the first data processing task comprises a first task requirement; According to the first task requirement, matching a plurality of required first operators from a plurality of alternative operators and generating a first data processing link to be optimized based on the plurality of first operators; optimizing the first data processing link to be optimized by using a trained reinforcement learning model according to the first data processing link to be optimized, the performance estimation of the first data processing link to be optimized and the current running states of the plurality of candidate operators, so as to obtain a first target data processing link; And operating the first target data processing link to process a data source to be processed corresponding to the first data processing task.
14. A link generation apparatus, comprising: the acquisition module is used for acquiring a first task requirement of a first data processing task; the generation module is used for matching a plurality of first operators required from a plurality of alternative operators according to the first task requirement and generating a first data processing link to be optimized based on the plurality of first operators; The optimization module is used for optimizing the first data processing link to be optimized by utilizing a trained reinforcement learning model according to the first data processing link to be optimized, the performance estimation of the first data processing link to be optimized and the current running states of the plurality of alternative operators, so as to obtain a first target data processing link; Wherein the first target data processing link is for processing the first data processing task.
15. A reinforcement learning device, comprising: The acquisition module is used for acquiring a second data processing link to be optimized corresponding to a second data processing task when the current round starts, wherein the second data processing link to be optimized is determined based on a second task requirement of the second data processing task and a plurality of candidate operators; The determining module is used for determining an optimizing action by utilizing a reinforcement learning model to be trained according to the second data processing link to be optimized, the current running states of the plurality of alternative operators and the performance estimation of the second data processing link to be optimized; the link optimization module is used for optimizing the second data processing link to be optimized based on the optimization action to obtain a new second data processing link to be optimized; the model optimization module is used for optimizing the reinforcement learning model to be trained according to the environmental rewards after the optimization actions; The decision module is used for taking the second data processing link to be optimized after the current round as a new second data processing link to be optimized when the current round does not meet the ending condition, returning to execute the current running state of the second data processing link to be optimized, the plurality of alternative operators and the performance estimation of the second data processing link to be optimized, and determining the optimizing action by utilizing the reinforcement learning model to be trained, and the acquisition module is used for ending the current round when the current round meets the ending condition.
16. A data processing apparatus, comprising: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first data processing task, and the first data processing task comprises a first task requirement; the generation module is used for matching a plurality of first operators required from a plurality of alternative operators according to the first task requirement and generating a first data processing link to be optimized based on the plurality of first operators; The optimization module is used for optimizing the first data processing link to be optimized by utilizing a trained reinforcement learning model according to the first data processing link to be optimized, the performance estimation of the first data processing link to be optimized and the current running states of the plurality of alternative operators, so as to obtain a first target data processing link; And the operation module is used for operating the first target data processing link to process the data source to be processed corresponding to the first data processing task.
17. An electronic device comprising a memory and a processor, wherein, The memory is used for storing programs; The processor, coupled to the memory, for executing the program stored in the memory to implement the method of any one of claims 1 to 13.
18. A computer readable storage medium storing a computer program, wherein the computer program is capable of implementing the method of any one of claims 1 to 13 when executed by a computer.
19. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 13.

Description

Link generation, reinforcement learning and data processing method, equipment and storage medium Technical Field The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, and a storage medium for link generation, reinforcement learning, and data processing. Background In the era of continuous development of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), big data and artificial intelligence supplement each other and become the core power for promoting technical progress. Data is the basis of artificial intelligence, and its quality and number directly determine the performance and reliability of artificial intelligence models. Particularly in the training of large models, high-quality large-scale data can improve the reasoning capacity of the models and reduce uncertainty in the training process. Current large data platforms typically rely on an expert to manually configure the data processing links for processing the data or to automatically generate the data processing links based on a rule base preconfigured by the expert. However, this approach has the problems of high labor cost and sometimes poor performance of the constructed link due to expert knowledge limitations and thinking limitations. Disclosure of Invention Aspects of the present description provide a link generation, reinforcement learning method, apparatus, storage medium, and program product for improving the performance of automatically generated data processing links. A first aspect of the present specification provides a link generation method, including: Acquiring a first task requirement of a first data processing task; According to the first task requirement, matching a plurality of required first operators from a plurality of alternative operators and generating a first data processing link to be optimized based on the plurality of first operators; optimizing the first data processing link to be optimized by using a trained reinforcement learning model according to the first data processing link to be optimized, the performance estimation of the first data processing link to be optimized and the current running states of the plurality of candidate operators, so as to obtain a first target data processing link; Wherein the first target data processing link is for processing the first data processing task. A second aspect of the present specification provides a reinforcement learning method, including: when the round starts, a second data processing link to be optimized corresponding to a second data processing task is obtained, wherein the second data processing link to be optimized is determined based on a second task requirement of the second data processing task and a plurality of alternative operators; Determining an optimization action by using a reinforcement learning model to be trained according to the second data processing link to be optimized, the current running states of the plurality of alternative operators and the performance estimation of the second data processing link to be optimized; Optimizing the second data processing link to be optimized based on the optimizing action to obtain a new second data processing link to be optimized; Optimizing the reinforcement learning model to be trained according to the environmental rewards after the optimization actions; When the round does not meet the ending condition, taking the second data processing link to be optimized after the optimization as a new second data processing link to be optimized, and returning to execute the current running states of the second data processing link to be optimized, the plurality of alternative operators and the performance estimation of the second data processing link to be optimized, and determining the optimizing action by utilizing a reinforcement learning model to be trained; and ending the round when the round meets the ending condition. A third aspect of the present specification provides a data processing method, comprising: acquiring a first data processing task, wherein the first data processing task comprises a first task requirement; According to the first task requirement, matching a plurality of required first operators from a plurality of alternative operators and generating a first data processing link to be optimized based on the plurality of first operators; optimizing the first data processing link to be optimized by using a trained reinforcement learning model according to the first data processing link to be optimized, the performance estimation of the first data processing link to be optimized and the current running states of the plurality of candidate operators, so as to obtain a first target data processing link; And operating the first target data processing link to process a data source to be processed corresponding to the first data processing task. A fourth aspect of the present specification provides a link generation apparatus, comprising: the ac