KR-20260064854-A - APPARATUS AND METHOD FOR HUMAN STRATEGY LEARNING-BASED MULTI-AGENT DEEP LEARNING
Abstract
The present invention relates to a multi-agent learning device and method based on human strategy learning, and more specifically, to a multi-agent reinforcement learning device and method based on human strategy learning in online team sports games. According to an embodiment of the present invention, the multi-agent learning device and method based on human strategy learning can perform artificial intelligence-based multi-agent reinforcement learning. The present invention facilitates the derivation of clear actions in an online sports game environment when reinforcement learning is applied using human strategy learning. Furthermore, the present invention has the advantage of being applicable to games of various genres by designing the human strategy learning method to suit other complex game environments.
Inventors
- 조경은
- 김준오
- 이성빈
Assignees
- 동국대학교 산학협력단
Dates
- Publication Date
- 20260508
- Application Date
- 20241029
Claims (11)
- In a human strategy learning-based multi-agent learning device, A data collection unit that collects all observational data about the surroundings of the agent; A data processing unit that processes the above-mentioned collected observation data by dividing it into a certain range; A data learning unit that performs data learning through the above-mentioned processed observation data; and A data execution unit that determines an action using the above-mentioned learned data and applies it to a game Human strategy learning-based multi-agent learning device.
- In paragraph 1, The above data processing unit Using information about the objects of the data observed in each ray region Human strategy learning-based multi-agent learning device.
- In paragraph 1, The above data learning unit A multi-stage compensation design unit for designing multi-stage compensation to be applied to the above-mentioned processed data; and A learning design unit that designs the learning of human strategies based on the multi-stage rewards designed above Human strategy learning-based multi-agent learning device.
- In paragraph 3, The above multi-stage compensation design department An immediate reward design unit that designs an immediate reward value for behavioral accuracy; A strategy reward design unit that designs a strategy reward value smaller than the immediate reward value for human-defined strategy learning; and A global reward design unit that designs a global reward value greater than the sum of the immediate reward value and the strategic reward value to achieve the final goal Human strategy learning-based multi-agent learning device.
- In paragraph 3, The above learning design department A human strategy learning unit that trains a human strategy AI model to derive human-defined strategies for each situation; A behavior learning unit that learns the accuracy of behavior so that the above-mentioned learned model organizes various situations into sequential steps; and A self-learning unit in which a model that has learned the accuracy of the above behavior performs additional learning through battles with itself. Human strategy learning-based multi-agent learning device.
- In a human strategy learning-based multi-agent learning method, Step of collecting all observational data about the area around the agent; A step of processing the above-mentioned collected observation data by dividing it into a certain range; A step of performing data learning through the above-mentioned processed observation data; and A step comprising determining an action using the above-mentioned learned data and applying it to the game. Human Strategy Learning-Based Multi-Agent Learning Method
- In paragraph 6, The step of processing the above-mentioned collected observation data by dividing it into a certain range Using information about the objects of the data observed in each ray region Human Strategy Learning-Based Multi-Agent Learning Method
- In paragraph 6, The step of performing data learning through the above-mentioned processed observation data is A step of designing a multi-stage compensation to apply to the above-mentioned processed data; and A step of designing the learning of human strategies based on the multi-stage rewards designed above. Human Strategy Learning-Based Multi-Agent Learning Method
- In paragraph 8, The step of designing multi-stage compensation to apply to the above-mentioned processed data is Step of designing an immediate reward value for behavioral accuracy; A step of designing a strategy reward value smaller than the immediate reward value for human-defined strategy learning; and A step including designing a global reward value greater than the sum of the immediate reward value and the strategic reward value to achieve the final goal. Human Strategy Learning-Based Multi-Agent Learning Method
- In paragraph 8, The step of designing the learning of human strategies based on the multi-stage reward designed above is A step of training a human strategy AI model to derive human-defined strategies for each situation; A step of learning the accuracy of actions so that the above-mentioned learned model organizes various situations into sequential steps; and The accuracy of the above behavior includes a step in which the learned model learns additionally through self-competition. Human Strategy Learning-Based Multi-Agent Learning Method
- A computer program recorded on a computer-readable recording medium that executes any one of the human strategy learning-based multi-agent learning methods of paragraphs 6 through 10.
Description
Apparatus and Method for Human Strategy Learning-Based Multi-Agent Deep Learning The present invention relates to a multi-agent learning device and method based on human strategy learning, and more specifically, to a multi-agent reinforcement learning device and method based on human strategy learning in online team sports games. Online basketball games that have been commercialized for a long time utilize FSM (Finite State Machine)-based artificial intelligence. However, FSM-based AI requires manual design work to create behaviors derived from various game states that mimic human gameplay. Furthermore, because FSM-based AI derives predefined behaviors, users quickly adapt to the AI's behavioral patterns and easily predict subsequent actions, leading to a decline in user satisfaction. Redesigning the FSM by increasing the number of states to address this issue presents a problem: the structure becomes unintuitive and complex due to the sheer volume of states, resulting in difficulties with additional maintenance. The background technology of the present invention is disclosed in Korean Registered Patent No. 10-2507719. FIGS. 1 to 4 are drawings for explaining the configuration of a human strategy learning-based multi-agent learning device according to an embodiment of the present invention. FIGS. 5 to 11 are drawings for illustrating a human strategy learning-based multi-agent learning model according to an embodiment of the present invention. FIG. 12 is a diagram illustrating an algorithm used in human strategy learning-based multi-agent learning according to an embodiment of the present invention. FIGS. 13 to 19 are drawings showing performance results according to human strategy learning-based multi-agent learning according to an embodiment of the present invention. FIGS. 20 and 21 are flowcharts of a human strategy learning-based multi-agent learning device according to an embodiment of the present invention. The present invention is susceptible to various modifications and may have various embodiments; therefore, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to specific embodiments, and it should be understood that the invention includes all modifications, equivalents, and substitutions that fall within the spirit and scope of the invention. In describing the present invention, detailed descriptions of related prior art are omitted if it is determined that such detailed descriptions would unnecessarily obscure the essence of the invention. Furthermore, singular expressions used in this specification and claims should generally be interpreted to mean "one or more" unless otherwise stated. Throughout the specification, when it is stated that a part is "connected (connected, in contact, combined)" with another part, this includes not only cases where they are "directly connected," but also cases where they are "indirectly connected" with other members interposed between them. Furthermore, when it is stated that a part "includes" a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but rather allows for the inclusion of additional components. The terms used herein are merely for describing specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as “comprising” or “having” are intended to indicate the presence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. The present invention will be described below with reference to the attached drawings. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. Furthermore, in order to clearly explain the present invention in the drawings, parts unrelated to the explanation have been omitted, and similar parts throughout the specification have been given similar reference numerals. FIGS. 1 to 4 are drawings for explaining the configuration of a human strategy learning-based multi-agent learning device according to an embodiment of the present invention. Referring to FIG. 1, a human strategy learning-based multi-agent learning device (100) includes a data collection unit (110), a data processing unit (130), a data learning unit (150), and a data execution unit (170). The data collection unit (110) collects all observation data about the surroundings of the agent. The data collection unit (110) collects data provided from the environment to ensure real-time learning in a commercial online basketball game where the environment changes in real time. In particular, since the bask