KR-20260064859-A - GENERATIVE ADVERSARIAL IMITATION LEARNING(GAIL) DEVICE AND METHOD FOR GAIL AGENT TRAINING BASED ON EXPERT TRAJECTORY DATA

KR20260064859AKR 20260064859 AKR20260064859 AKR 20260064859AKR-20260064859-A

Abstract

The present invention relates to a generative adversarial imitation learning device that trains an agent by imitating expert path data, comprising: a learning initialization unit that receives expert path data composed of a sequence of actions performed in the past and initializes parameters of a global encoder shared by an agent and a discriminator; a sample extraction unit that extracts samples from the expert path; a global encoder processing unit that trains the agent and the discriminator through the samples to update the parameters of the global encoder and fixes the parameters of the global encoder when training is completed; and an agent training unit that performs GAIL learning in which the agent imitates the actions of the expert.

Inventors

조성배
문형준

Assignees

연세대학교 산학협력단

Dates

Publication Date: 20260508
Application Date: 20241029

Claims (11)

A learning initialization unit that receives expert path data consisting of sequences of actions performed in the past and initializes parameters of a global encoder shared by the agent and discriminator; A sample extraction unit that extracts a sample from the path of the above-mentioned expert; A global encoder processing unit that updates the parameters of the global encoder by learning the agent and the discriminator through the sample and fixes the parameters of the global encoder when the learning is completed; and Generative adversarial imitation learning device comprising an agent learning unit that performs GAIL learning to mimic the behavior of the expert by the agent.
In paragraph 1, the learning initialization unit Generative adversarial imitation learning device characterized by initializing an Actor Network that determines what action to take in a given state for the above-mentioned agent.
In paragraph 2, the above learning initialization unit Generative adversarial imitation learning device characterized by initializing a Critic Network that evaluates the value of a selected action in a specific state for the above agent.
In paragraph 1, the learning initialization unit Generative adversarial imitation learning device characterized by initializing a discriminator network that evaluates how similar the behavior of the agent is to the behavior of the expert.
In claim 1, the sample extraction unit Generative adversarial imitation learning device characterized by repeating the process of extracting states and behaviors in the path of the above expert as samples.
In paragraph 1, the global encoder processing unit A generative adversarial imitation learning device characterized by receiving a state through an actor network for the agent, selecting one of the actions that can be selected from the state (hereinafter, selected action), and performing the learning by evaluating the selected action through a critic network and a discriminator network for the discriminator of the agent.
In paragraph 6, the global encoder processing unit Generative adversarial imitation learning device characterized by sharing the global encoder through the global encoding unit of the agent and the global encoding unit of the discriminator.
In claim 7, the global encoder processing unit A generative adversarial imitation learning device characterized by performing the learning to process actions by converting the state into a feature vector through each of the global encoding unit of the agent and the global encoding unit of the discriminator.
In paragraph 6, the global encoder processing unit Generative adversarial imitation learning device characterized by improving the learning to minimize the cost function evaluating the imitation of the expert by the agent.
In paragraph 1, the agent learning unit A generative adversarial imitation learning device characterized by optimizing policies in the expert's path through the above GAIL learning to learn the behavior of the agent to be similar to the expert's behavior.
In a generative adversarial imitation learning method performed in a generative adversarial imitation learning device, A learning initialization step that receives expert path data consisting of sequences of actions performed in the past and initializes the parameters of a global encoder shared by the agent and the discriminator; A sample extraction step for extracting a sample from the path of the above-mentioned expert; A global encoder processing step for updating the parameters of the global encoder by learning the agent and the discriminator through the sample and fixing the parameters of the global encoder when the learning is completed; and Generative adversarial imitation learning method comprising an agent learning step that performs GAIL learning to mimic the behavior of the expert by the agent.

Description

Generative Adversarial Imitation Learning (GAIL) Device and Method for GAIL Agent Training Based on Expert Path Data The present invention relates to a generative adversarial imitation learning technique, and more specifically, to a generative adversarial imitation learning device and method that perform GAIL learning so that an agent imitates the behavior of an expert based on expert path data. Autonomous agent learning is a technology trained to make independent decisions in complex environments, and it is an important technology utilized in various fields such as autonomous driving, robotics, and video games. In particular, imitation learning, which trains agents by mimicking human behavior, complements the limitations of reinforcement learning and enables agents to perform complex tasks by utilizing expert demonstration data. This type of imitation learning is also being effectively applied in environments where it is difficult to define clear reward signals. Generative Adversarial Imitation Learning (GAIL), based on Generative Adversarial Networks (GANs), is a powerful technique for mimicking expert behavior and plays a crucial role in enabling autonomous agents to learn expert paths to solve complex tasks. However, existing GAIL models have limitations in that they cannot adequately handle the complexity of input data and are not effective in suppressing incorrect behaviors during the agent's learning process. In particular, performance degradation may occur when handling high-dimensional inputs such as sequence data, and there is a problem with the inability to accurately mimic expert behavior in complex environments. To address these issues, there is a need for technology that enables agents to learn stably even in complex environments by applying more advanced imitation learning techniques and data processing methods. Korean Published Patent No. 10-2020-0115213 (October 7, 2020) describes a technology that provides a system and method for temporarily switching a character or virtual object controlled by a player in a video game to emulated control when the player's device loses network connectivity or experiences a problem. This system allows the character to mimic the actual player's play style and continue to operate until the game session ends or the problem is resolved, thereby providing other players with an uninterrupted gaming experience. In multiplayer games, multiple players can cooperate to progress together. However, if one player is unable to continue the game due to network connection issues, excessive latency, or game application crashes, it can put the remaining players at a disadvantage. To address this problem, this system allows the character of a disconnected player to continue participating in the game in a manner similar to that of other players. Figure 1 is a diagram illustrating an input image sequence and an agent's output action in a MineRL environment to explain the process of an agent determining an action based on an input image sequence. FIG. 2 is a diagram illustrating the configuration of a generative adversarial imitation learning device according to one embodiment of the present invention. Figure 3 is a flowchart explaining the operation of the generative adversarial imitation learning device of Figure 2. Figure 4a is a graph illustrating various configurations of the reward function used in agent learning. Figure 4b is a diagram illustrating the proposed GAIL-based learning algorithm. Figure 4c is a diagram illustrating the DI-GAIL algorithm combined with Directed Information. Figure 5 is a diagram illustrating examples of various experimental environments used for agent training and performance evaluation. Figure 6 is a diagram comparing the learning strategies of global encoders. Figure 7 is a diagram visualizing the encoded state in the global encoder using the t-SNE technique. Figure 8 is a diagram visually showing the trajectory and the prediction results of each code in a navigation task using the DI-Ours algorithm. Figure 9 is a diagram showing the usage ratio of unsupervised learned code variables in the DI-Ours algorithm. The description of the present invention is merely an example for structural or functional explanation, and therefore the scope of the present invention should not be interpreted as being limited by the examples described in the text. That is, since the examples are subject to various modifications and may take various forms, the scope of the present invention should be understood to include equivalents capable of realizing the technical concept. Furthermore, the objectives or effects presented in the present invention do not imply that a specific example must include all of them or only such effects; therefore, the scope of the present invention should not be understood as being limited by them. Meanwhile, the meaning of the terms described in this application should be understood as follows. Terms such as "first," "second," etc., are in