CN-122021706-A - Calculation graph dynamic topology reconstruction method, agent training method and related devices

CN122021706ACN 122021706 ACN122021706 ACN 122021706ACN-122021706-A

Abstract

The application provides a calculation map dynamic topology reconstruction method, an agent training method and a related device, and relates to the field of enterprise-level process automation. The electronic equipment processes abnormal indexes of historical tasks through statistics of staff, informs a user to take over manually when the abnormal indexes exceed a preset alarm threshold, synchronously records actual operation steps after the user takes over, and then updates a current standard operation flow chart according to the obtained operation records to generate a new standard operation flow chart adapting to new rules. In the process, the real business feedback is directly converted into a new flow chart without manually writing codes or off-line retraining a model, so that a digital employee can autonomously learn rule change from one click or one take over of a user, quickly recover execution capability and effectively avoid business interruption caused by flow change.

Inventors

WANG LIMENG
WANG MINGCHAO
LIU BING
CHENG XIAOJIE
ZHANG LEI

Assignees

上海序禄信息科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260409

Claims (10)

1. A method for dynamic topology reconstruction of a computational graph, the method further comprising: counting abnormal indexes of historical tasks processed by staff; if the abnormal index is larger than the set alarm threshold, notifying a user to take over manually, and acquiring an operation record after taking over by the user; and updating the current standard operation flow chart according to the operation record to obtain a new standard operation flow chart.
2. The method of claim 1, wherein updating the current standard workflow according to the operation record to obtain a new standard workflow comprises: determining a new added or deleted service node according to the operation record; Acquiring a backup flow chart of the current standard operation flow chart, and updating the backup flow chart according to the newly added or deleted service node to obtain a new backup flow chart; and if the new backup flow chart does not have a circulation path, taking the new backup flow chart as the new standard operation flow chart.
3. The method for reconstructing a dynamic topology of a computational graph according to claim 2, wherein determining a new or deleted service node according to the operation record comprises: And comparing and analyzing the operation record with the current standard operation flow chart to determine the newly added or deleted service node.
4. The method of claim 2, wherein prior to taking the new backup flowchart as the new standard workflow, the method further comprises: Counting the waiting node quantity of each service node in the new backup flow chart, wherein the waiting node quantity of each service node is equal to the number of directed edges pointing to the service node; putting all service nodes with the waiting node quantity of 0 into a queue to be accessed as nodes to be accessed; Sequentially taking out target nodes from the queue to be accessed, marking the target nodes as accessed states, and subtracting 1 from the waiting node quantity of the subsequent nodes of the target nodes, wherein if the subsequent nodes with the waiting node quantity reduced to 0 exist, the target nodes are used as the nodes to be accessed to be added into the queue to be accessed; returning to the step of sequentially taking out target nodes from the queue to be accessed and marking the target nodes as accessed states until the queue to be accessed is empty; And if the total number of the service nodes in the accessed state is equal to the total number of the nodes in the new backup flow chart, judging that the new backup flow chart has no circulation path.
5. The computational graph dynamic topology reconstruction method of claim 1, wherein the digital employee is built based on an agent model, the method further comprising: Acquiring the execution state of a current task and observation information of a task environment, wherein the execution state comprises the description information of a current service node; Generating an operation mask according to the description information of the current service node and the current standard operation flow chart, wherein the operation mask is used for identifying legal operation and illegal operation in a candidate operation space under the current service node; and processing the description information and the observation information of the current service node through the intelligent body model, and determining a prediction operation to be executed from a candidate operation space, wherein the operation mask prunes the candidate operation space in the reasoning process of the intelligent body model, and enables data corresponding to the candidate operation identified as illegal operation not to participate in the operation of the intelligent body model.
6. A method of training an agent, the method comprising: Acquiring an actual operation sequence of executing a target task by a model to be trained; obtaining reward and punishment information aiming at the actual operation sequence according to the actual operation sequence and the current reference operation sequence of the target task, wherein the reward and punishment information comprises a first punishment item for redundant operation in the actual operation sequence; updating the model to be trained according to the reward and punishment information to obtain an agent model capable of completing the target task.
7. The agent training method of claim 6, wherein the first penalty term is proportional to a performance overhead of the redundant operation, the reward and punishment information further comprising a task completion indication term and a second penalty term for a missing operation in the reference sequence of operations; obtaining punishment information aiming at the actual operation sequence according to the actual operation sequence and a reference operation sequence required for completing the target task, wherein the punishment information comprises the following steps: obtaining the task completion indication item according to the execution result of the actual operation sequence; comparing the actual operation sequence with the reference operation sequence to obtain the redundant operation and the missing operation; obtaining the first penalty term according to the performance overhead factor of the redundant operation; obtaining the second penalty term according to the performance overhead factor of the missing operation; obtaining the reward and punish information according to the task completion indication item, the first punish item and the second punish item; the relation among the task completion indication item, the first punishment item, the second punishment item and the reward punishment information is as follows: In the formula, Representing the reward and punishment information, Representing an indication of the completion of the task, The sequence of reference operations is represented as such, Representing the sequence of said actual operations, Representing performance overhead factors for each of the redundant operations and the missing operations, For calculating the difference between the two sequences in operation type and execution order.
8. A computational graph dynamic topology reconstruction apparatus, the apparatus further comprising: The model detection module is used for counting abnormal indexes of historical task processing of the staff; the flow updating module is used for notifying a user to take over manually and acquiring an operation record after the user takes over if the abnormal index is larger than a set alarm threshold; and the flow updating module is also used for updating the current standard operation flow chart according to the operation record to obtain a new standard operation flow chart.
9. An agent training device, the device comprising: the forward reasoning module is used for acquiring an actual operation sequence of the target task executed by the model to be trained; The model updating module is used for obtaining reward and punishment information aiming at the actual operation sequence according to the actual operation sequence and the current reference operation sequence of the target task, wherein the reward and punishment information comprises a first punishment item for redundant operation in the actual operation sequence; The model updating module is further used for updating the model to be trained according to the reward and punishment information so as to obtain digital staff capable of completing the target task.
10. An electronic device comprising a processor and a memory, the memory storing a computer program that, when executed by the processor, implements the computational graph dynamic topology reconstruction method of any one of claims 1-5 or the agent training method of any one of claims 6-7.

Description

Calculation graph dynamic topology reconstruction method, agent training method and related devices Technical Field The application relates to the field of enterprise-level process automation, in particular to a computational graph dynamic topology reconstruction method, an agent training method and a related device. Background The intelligent agent based on the large model can be used as a digital employee to simulate a real person to finish operation, for example, in the vertical business scenes such as automobile finance or government affair approval, the digital employee can interact with a front-end user interface like real customer service to finish clicking buttons, filling forms and calling back-end interfaces, so that the end-to-end business processes such as vehicle deposit, loan approval and material submission are autonomously finished. Moreover, the digital staff does not rely on manual writing of fixed rules, but can dynamically generate action decisions after training in a specific business scene, so that the digital staff has certain generalization capability and task migration capability. In the practical process, when the current digital staff encounters a change of a business standard operation flow chart, the traditional model cannot effectively utilize the implicit feedback signal of the non-gradient. The non-gradient implicit feedback signals are specifically expressed as front-end UI interaction behaviors such as clicking operation, keyboard input, mouse hovering and the like in the manual takeover process. The signals do not carry gradient information and do not form labeling labels required by standard supervision learning, so that a traditional model usually takes user takeover as a simple failure sample to be directly discarded, or the interaction behaviors must be manually marked afterwards, and marked data is sent into an offline training process, so that the whole process usually needs more time to complete model updating, the real-time self-adaptive effect cannot be realized at all, and once a business standard operation flow chart is adjusted, for example, a financial wind control rule is temporarily tightened or the process sequence of an industrial pipeline is changed, the execution success rate of digital staff drops off, the digital staff is difficult to recover autonomously for a quite long time, and finally actual business interruption is caused. Disclosure of Invention In order to overcome at least one defect in the prior art, the application provides a calculation graph dynamic topology reconstruction method, an intelligent training method and a related device, which can directly convert real business feedback into a new flow chart without manually writing codes or off-line retraining a model, so that a digital employee can autonomously learn rule change from one click or one take over of a user, quickly recover execution capability and effectively avoid business interruption caused by flow change. In a first aspect, the present application provides a method for reconstructing a dynamic topology of a computational graph, the method further comprising: counting abnormal indexes of historical tasks processed by staff; if the abnormal index is larger than the set alarm threshold, notifying a user to take over manually, and acquiring an operation record after taking over by the user; and updating the current standard operation flow chart according to the operation record to obtain a new standard operation flow chart. In a second aspect, the present application provides an agent training method, the method comprising: Acquiring an actual operation sequence of executing a target task by a model to be trained; obtaining reward and punishment information aiming at the actual operation sequence according to the actual operation sequence and the current reference operation sequence of the target task, wherein the reward and punishment information comprises a first punishment item for redundant operation in the actual operation sequence; updating the model to be trained according to the reward and punishment information to obtain an agent model capable of completing the target task. In a third aspect, the present application provides a computation graph dynamic topology reconstruction device, the device further comprising: The model detection module is used for counting abnormal indexes of historical task processing of the staff; the flow updating module is used for notifying a user to take over manually and acquiring an operation record after the user takes over if the abnormal index is larger than a set alarm threshold; and the flow updating module is also used for updating the current standard operation flow chart according to the operation record to obtain a new standard operation flow chart. In a fourth aspect, the present application provides an agent training device, the device comprising: the forward reasoning module is used for acquiring an actual operation sequence of