KR-102963765-B1 - Systems and methods for robotic process automation

KR102963765B1KR 102963765 B1KR102963765 B1KR 102963765B1KR-102963765-B1

Abstract

A method for training an RPA robot to use a GUI is disclosed. The method includes the steps of: capturing a video of the GUI as an operator performs a process using the GUI; capturing a sequence of events triggered as the operator performs the process using the GUI; and analyzing the video and the sequence of events to generate a workflow. When the workflow is executed by the RPA robot, the RPA robot performs the process using the GUI.

Inventors

칼리, 재키스
듀바, 크리쉬나
카르, 벤
쿠쿠룰, 기욤
악타스, 위미트 루센

Assignees

블루 프리즘 리미티드

Dates

Publication Date: 20260513
Application Date: 20200501

Claims (20)

As a method for training a Robotic Process Automation (RPA) robot to use a graphical user interface (GUI), A step of capturing a video of the GUI as the operator performs a process using the GUI; A step of capturing a sequence of events triggered as the operator performs the process using the GUI; and Step of analyzing the sequence of the above video and the above events ― generating a workflow by analyzing the sequence of the above video and the above events, and when the workflow is executed by an RPA robot, the RPA robot performs the above process using the GUI ― Includes, The above-mentioned analysis step A step of identifying one or more interactive elements of the GUI from the video; and A step of matching at least one of the events in the sequence of the above events as corresponding to at least one of the interactive elements Includes more, The step of identifying a given interactive element among the one or more interactive elements mentioned above is A step of identifying one or more anchor elements in the GUI for the given interactive element; and A step of associating one or more of the above anchor elements with the given interactive elements A method including
In paragraph 1, A method in which a given anchor element among one or more of the above anchor elements is identified for a given interactive element based on expected concurrent GUI elements.
In paragraph 1 or 2, A method in which a given anchor element among one or more of the above anchor elements is identified for a given interactive element based on the proximity of the given anchor element to the given interactive element.
In paragraph 1, A method in which a given anchor element among one or more of the above anchor elements is identified for a given interactive element based on the types of the given anchor element and the given interactive element.
In paragraph 1, The above one or more anchor elements Identifying a predetermined number of nearest GUI elements for the given interactive element as one or more anchor elements using a k-nearest neighbor approach; Identifying a predetermined number of nearest GUI elements in one or more predetermined directions from the given interactive element as the one or more anchor elements; and Identifying all GUI elements within a predefined area of the given interactive element as one or more anchor elements. A method identified for the given interactive element based on at least one of the above.
In paragraph 1, A method in which each of the above one or more anchor elements has an associated weight.
In paragraph 1, A method in which the step of identifying one or more interactive elements is performed by applying a trained machine learning algorithm to at least a portion of the video.
In paragraph 1, A method comprising the step of identifying the given interactive element, the step of identifying the location of one or more anchor elements in the GUI for the given interactive element.
In paragraph 8, A method in which a machine learning algorithm is used to identify one or more anchor elements based on one or more predetermined feature values.
In Paragraph 9, A method in which the above feature values are determined through training of the above machine learning algorithm.
In Paragraph 9, The above feature value Distance between the first GUI element and the second GUI element; Orientation of the first GUI element with respect to the second GUI element; and Whether the first GUI element is in the same application window as the second GUI element A method comprising any one or more of the following.
In paragraph 1, The sequence of the above events is Key press event; Hoverover event; Click event; Drag event; and Gesture event A method comprising any one or more of the following.
In paragraph 1, A method comprising the step of including one or more inferred events based on the above video in a sequence of said events.
In Paragraph 13, A method in which hover events are inferred based on one or more interface elements visible in the GUI.
In paragraph 1, A method comprising a step of analyzing, wherein the above-mentioned step includes identifying a sequence of subprocesses of the above-mentioned process.
In paragraph 15, A method in which the process output of one of the subprocesses of the above sequence is used by the RPA robot as a process input for another subprocess of the above sequence.
In paragraph 15, A method comprising the step of editing the above-mentioned generated workflow to include a portion of a previously generated workflow corresponding to an additional subprocess, and when the edited workflow is executed by an RPA robot, the RPA robot performs a version of the process using the GUI, wherein the version of the process includes the additional subprocess.
In Paragraph 17, A method in which a version of the above process includes the additional subprocess instead of the existing subprocess of the above process.
In paragraph 1, A method in which at least one of the above video and the sequence of above events is captured using a remote desktop system.
A method for performing a process using a GUI with an RPA robot trained by the method according to paragraph 1.

Description

Systems and methods for robotic process automation The present invention relates to systems and methods for robotic process automation, and in particular, to the automatic training of robotic process automation robots. Human-guided computer processes are ubiquitous across many technological fields and endeavors. Modern graphical user interfaces (GUIs) have proven invaluable in enabling human operators to use computer systems to perform often complex data processing and/or system control tasks. However, while GUIs often allow human operators to quickly become accustomed to performing new tasks, they present a high barrier to any additional automation of those tasks. Traditional workflow automation aims to use GUIs to take tasks typically performed by operators and automate them so that the computer system can perform the same tasks without significant re-engineering of the underlying software used to execute the work. Initially, this required exposing software APIs (Application Programming Interfaces) so that scripts could be manually designed to execute the required functions of the software to perform the requested tasks. Robotic Process Automation (RPA) systems represent an evolution of this approach and utilize software agents (referred to as RPA robots) to interact with computer systems through existing graphical user interfaces (GUIs). RPA robots can then generate appropriate input commands for the GUI to have a given process executed by the computer system. This enables the automation of processes, transforming attended processes into unattended processes. The benefits of such an approach are numerous, including greater repeatability for reducing or even eliminating the possibility of human error in a given process, along with greater scalability that allows multiple RPA robots to perform the same task across multiple computer systems. However, the process of training RPA robots to perform specific tasks can be cumbersome and requires human operators to use the RPA system itself to program within a specific process where they specifically identify each individual step. Human operators are also required to identify specific parts of the GUI to be interacted with and establish a workflow for the RPA robot to utilize. [Prior Art Literature] Japanese Patent Publication No. 2018-535459 (November 29, 2018) Japanese Patent Publication No. JP2019-168945 (Oct. 03, 2019) The present invention provides a method for training an RPA robot to perform tasks using a GUI based solely on events (or inputs) triggered by an operator when analyzing and processing a video of an operator using a GUI. In this way, the aforementioned problems of the prior art regarding the training of RPA robots can be eliminated. In a first embodiment, a method for training an RPA robot (or script or system) to use a GUI is provided. The method comprises the steps of: capturing a video of the GUI as an operator (or user) performs a process (or task) using the GUI; capturing a sequence of events triggered as the operator performs the process using the GUI; and analyzing the video and the sequence of events to generate a workflow. The workflow, when executed by the RPA robot, causes the RPA robot to perform the process using the GUI. The capturing step may be performed by a remote desktop system. The analysis step may further include the step of identifying one or more interactive elements of a GUI from the above video and the step of matching at least one of the events in a sequence of events as corresponding to at least one of the interactive elements. The interactive elements may be any typical GUI elements such as (but not limited to) a text box, a button, a context menu, a tab, a radio button (or an array thereof), a checkbox (or an array thereof), etc. The step of identifying the interactive elements may be performed by applying a trained machine learning algorithm to at least a portion of the video. The step of identifying interactive elements may include the step of identifying the locations of one or more anchor elements in the GUI for the said interactive elements. For example, a machine learning algorithm (such as a graph neural network) may be used to identify one or more anchor elements based on one or more predetermined feature values. The said feature values may also be determined through training of the machine learning algorithm. The above feature values may include any one or more of the distance between elements, the orientation of elements; and whether elements are in the same window. A sequence of events may include any one or more of keypress events; click events (such as a single click or multiple clicks); drag events; and gesture events. Inferred events based on video (such as hoverover events) may also be included in the sequence of events. Typically, hover events may be inferred based on one or more interface elements visible in the GUI. The analysis step may further include a step of identifying a sequence of subproc