Search

US-20260127499-A1 - ADAPTIVE WORKFLOW AUGMENTATION FOR IMPROVED TOOL AWARENESS IN AGENTIC TRAINING

US20260127499A1US 20260127499 A1US20260127499 A1US 20260127499A1US-20260127499-A1

Abstract

Systems and methods for optimizing visual reasoning task workflow. The systems and methods include generating an initial workflow trajectory to train a model to perform a task, the initial workflow trajectory being formed from environmental information, a prompt, and visual information and storing sub-workflows that form the initial workflow trajectory, the sub-workflows including actions that are performed by Application Programming Interfaces (APIs). The systems and methods further include refining the initial workflow trajectory to form an augmented workflow by iteratively optimizing the sub-workflows of the initial workflow trajectory, the iteratively optimizing includes comparing the environmental information of the augmented workflow with the environmental information from the initial workflow trajectory and selecting a sub-workflow that better meets a predetermined criteria to perform the task and training the model to perform the task with the augmented workflow.

Inventors

  • Vijay Kumar Baikampady Gopalkrishna
  • Manmohan Chandraker
  • Fucai Ke

Assignees

  • NEC LABORATORIES AMERICA, INC.

Dates

Publication Date
20260507
Application Date
20251105

Claims (20)

  1. 1 . A method comprising: generating an initial workflow trajectory to train a model to perform a task, the initial workflow trajectory being formed from environmental information, a prompt, and visual information; storing sub-workflows that form the initial workflow trajectory, the sub-workflows including actions that are performed by Application Programming Interfaces (APIs); refining the initial workflow trajectory to form an augmented workflow by iteratively optimizing the sub-workflows of the initial workflow trajectory, the iteratively optimizing includes comparing the environmental information of the augmented workflow with the environmental information from the initial workflow trajectory and selecting a sub-workflow that better meets a predetermined criteria to perform the task; and training the model to perform the task with the augmented workflow.
  2. 2 . The method of claim 1 , wherein the initial workflow trajectory is generated using an instruction-final answer pair.
  3. 3 . The method of claim 1 , wherein iteratively comparing further comprises: removing noise from the initial workflow trajectory by optimizing the sub-workflows with a loss function.
  4. 4 . The method of claim 1 , wherein iteratively comparing further comprises: updating the environmental information based on each iteration.
  5. 5 . The method of claim 1 , wherein the sub-workflows correspond to tools in a tool library known by a model.
  6. 6 . The method of claim 1 , wherein training the model further comprises: randomly masking one of the sub-workflows to form a randomly masked sub-workflow; and prompting the model to predict the randomly masked sub-workflow.
  7. 7 . The method of claim 6 , wherein the randomly masked sub-workflows are classified as positive feedback.
  8. 8 . A system for augmenting data for training a model to perform compositional visual reasoning tasks, comprising: a processor; and a memory storing computer-readable instructions that, when executed by the processor, cause the system to: generate an initial workflow trajectory to train a model to perform a task, the initial workflow trajectory being formed from environmental information, a prompt, and visual information; store sub-workflows that form the initial workflow trajectory, the sub-workflows including actions that are performed by Application Programming Interfaces (APIs); refine the initial workflow trajectory to form an augmented workflow by iteratively optimizing the sub-workflows of the initial workflow trajectory, the iteratively optimizing includes comparing the environmental information of the augmented workflow with the environmental information from the initial workflow trajectory and selecting a sub-workflow that better meets a predetermined criteria to perform the task; and train the model to perform the task with the augmented workflow.
  9. 9 . The system of claim 8 , wherein the initial workflow trajectory is generated using an instruction-final answer pair.
  10. 10 . The system of claim 8 , wherein the memory further causes the system to: remove noise from the initial workflow trajectory by optimizing the sub-workflows with a loss function.
  11. 11 . The system of claim 8 , wherein the memory further causes the system to: update the environmental information based on each iteration.
  12. 12 . The system of claim 8 , wherein the sub-workflows correspond to tools in a tool library known by a model.
  13. 13 . The system of claim 8 , wherein the memory further causes the system to: randomly mask one of the sub-workflows to form a randomly masked sub-workflow; and prompt the model to predict the randomly masked sub-workflow.
  14. 14 . The system of claim 13 , wherein the randomly masked sub-workflows are classified as positive feedback.
  15. 15 . A computer program product comprising a non-transitory computer-readable storage medium containing computer program code, the computer program code when executed by one or more processors causes the one or more processors to perform operations, the computer program code comprising instructions to: generate an initial workflow trajectory to train a model to perform a task, the initial workflow trajectory being formed from environmental information, a prompt, and visual information; store sub-workflows that form the initial workflow trajectory, the sub-workflows including actions that are performed by Application Programming Interfaces (APIs); refine the initial workflow trajectory to form an augmented workflow by iteratively optimizing the sub-workflows of the initial workflow trajectory, the iteratively optimizing includes comparing the environmental information of the augmented workflow with the environmental information from the initial workflow trajectory and selecting a sub-workflow that better meets a predetermined criteria to perform the task; and train the model to perform the task with the augmented workflow.
  16. 16 . The computer program code of claim 15 , wherein the initial workflow trajectory is generated using an instruction-final answer pair.
  17. 17 . The computer program code of claim 15 , wherein the computer program code further includes instructions to: remove noise from the initial workflow trajectory by optimizing the sub-workflows with a loss function.
  18. 18 . The computer program code of claim 15 , wherein the computer program code further includes instructions to: update the environmental information based on each iteration.
  19. 19 . The computer program code of claim 15 , wherein the sub-workflows correspond to tools in a tool library known by a model.
  20. 20 . The computer program code of claim 15 , wherein the computer program code further includes instructions to: randomly mask one of the sub-workflows to form a randomly masked sub-workflow; and prompt the model to predict the randomly masked sub-workflow.

Description

RELATED APPLICATION INFORMATION This application claims priority to U.S. Provisional Patent No. 63/717,369, filed on Nov. 7, 2024, and U.S. Provisional Patent No. 63/719,815, filed on Nov. 13, 2024, incorporated herein by reference in their entirety. BACKGROUND Technical Field The present invention relates to computer vision and more particularly an improvement to compositional visual reasoning capabilities in generative artificial intelligence models. Description of the Related Art Artificial intelligence (AI) models can act as planners and reasoners to perform complex tasks. Often these AI models are frozen (e.g., do not have parameters updated after training). As a result of not updating their parameters, frozen AI models cannot train to adapt/optimize sub-workflows, leading to significant inefficiencies such as wasted training data, etc. Additionally, frozen AI models do not understand the capabilities of the perception modules they employ, nor do they learn to generate workflows that utilize compositional approaches. In other words, frozen AI models do not fully grasp the capabilities of the tools they choose for a given workflow. This can result in low success rates and inefficiency in the workflow. Even still, when the workflow is logically coherent, the AI model can still fail due to tool errors (e.g., wrong tools selected, extraneous tools selected, insufficient tool selected, incompatible tools) and inaccuracies in the initial workflow. Moreover, training LLMs using full workflows that are incorrect or redundant can limit performance and inhibit future workflow generation optimization. SUMMARY According to an aspect of the present invention, a method is provided for generating an initial workflow trajectory to train a model to perform a task, the initial workflow trajectory being formed from environmental information, a prompt, and visual information and storing sub-workflows that form the initial workflow trajectory, the sub-workflows including actions that are performed by Application Programming Interfaces (APIs). The method further includes refining the initial workflow trajectory to form an augmented workflow by iteratively optimizing the sub-workflows of the initial workflow trajectory, the iteratively optimizing includes comparing the environmental information of the augmented workflow with the environmental information from the initial workflow trajectory and selecting a sub-workflow that better meets a predetermined criteria to perform the task and training the model to perform the task with the augmented workflow. According to another aspect of the present invention, a system is provided for a processor and a memory storing computer-readable instructions. The memory, when executed, causes the processor to generate an initial workflow trajectory to train a model to perform a task, the initial workflow trajectory being formed from environmental information, a prompt, and visual information and store sub-workflows that form the initial workflow trajectory, the sub-workflows including actions that are performed by Application Programming Interfaces (APIs). The memory can also cause the processor to refine the initial workflow trajectory to form an augmented workflow by iteratively optimizing the sub-workflows of the initial workflow trajectory, the iteratively optimizing includes comparing the environmental information of the augmented workflow with the environmental information from the initial workflow trajectory and selecting a sub-workflow that better meets a predetermined criteria to perform the task and train the model to perform the task with the augmented workflow. According to yet another aspect of the present invention, a computer program product comprising a non-transitory computer-readable storage medium containing computer program code, the computer program code when executed by one or more processors causes the one or more processors to perform operations. The computer program code comprising instructions to generate an initial workflow trajectory to train a model to perform a task, the initial workflow trajectory being formed from environmental information, a prompt, and visual information store sub-workflows that form the initial workflow trajectory, the sub-workflows including actions that are performed by Application Programming Interfaces (APIs). The computer program code also includes instructions to refine the initial workflow trajectory to form an augmented workflow by iteratively optimizing the sub-workflows of the initial workflow trajectory, the iteratively optimizing includes comparing the environmental information of the augmented workflow with the environmental information from the initial workflow trajectory and selecting a sub-workflow that better meets a predetermined criteria to perform the task and train the model to perform the task with the augmented workflow. These and other features and advantages will become apparent from the following detailed descripti