CN-121998075-A - Complex game decision scene-oriented intelligent body and construction method thereof

CN121998075ACN 121998075 ACN121998075 ACN 121998075ACN-121998075-A

Abstract

The invention provides an agent oriented to a complex game decision scene and a construction method thereof, and aims to solve the problem that the existing large language model agent has weak reasoning capacity in the complex game decision scene and improve the performance of the agent. The intelligent agent comprises a state adapter, an inference decision unit, an action adapter and an autonomous evolution unit, wherein the state adapter is used for extracting original state information from a complex game decision environment and arranging the original state information into a structured text for observation, the inference decision unit adopts a three-layer closed-loop structure of a planner, an executor and a verifier to decouple high-level strategic planning from low-level action execution and adopts the verifier to carry out self-correction, the action adapter converts standardized executable actions into operation instructions which can be identified by the complex game decision environment and carries out operation instructions in the complex game decision environment, and the autonomous evolution unit screens out high-scoring data by quantitatively evaluating the standardized executable actions and carries out supervision fine adjustment on the inference decision unit according to the high-scoring data.

Inventors

SHEN PENGBO
WANG YAQING
YANG YIQIN
XU SHUANG
XU BO

Assignees

中国科学院自动化研究所

Dates

Publication Date: 20260508
Application Date: 20251112

Claims (10)

1. The intelligent agent for the complex game decision scene is characterized by comprising a state adapter, an inference decision unit, an action adapter and an autonomous evolution unit, wherein the inference decision unit comprises a planner, a planning verifier, an executor and an execution verifier; The state adapter is used for extracting original state information from a complex game decision environment, converting the original state information into structured text observation which can be understood by a large language model and transmitting the structured text observation to the planner; The planner is a large language model and is used for outputting a high-level natural language strategy instruction to the planning verifier according to the input structured text observation; the system comprises a planning verifier, an execution verifier, an action adapter and an autonomous evolution unit, wherein the planning verifier is used for carrying out compliance verification on an input high-level natural language strategic instruction according to a preset rule base, driving the planner to carry out iterative correction on the high-level natural language strategic instruction if the verification is not passed, and inputting the high-level natural language strategic instruction passing the verification into the executor; the action adapter is used for converting the input standardized executable actions into operation instructions which can be identified by the complex game decision environment and executing the operation instructions in the complex game decision environment; The autonomous evolution unit is used for quantitatively evaluating the input standardized executable actions, screening out high-scoring data, and performing supervision fine tuning on the reasoning decision unit according to the high-scoring data.
2. The complex gaming decision scenario-oriented agent of claim 1, wherein the state adapter is configured to rank units in the complex gaming decision environment by spatial location using a proximity ranking algorithm and aggregate units performing similar tasks into a single description using a unit aggregation algorithm.
3. The complex game decision scene oriented agent of claim 1 wherein the planner generates the high-level natural language strategy instructions using structured hint templates embedded with game rules and expert knowledge.
4. The complex gaming decision scenario-oriented agent of claim 1, wherein the executor embeds a priority rule.
5. The complex game decision scenario oriented agent of claim 1, wherein the execution verifier is configured to verify whether a standardized executable action syntax format, a target unit exists, resources are sufficient, actions are in a preset skill list, and there is a conflict between instructions.
6. The complex gaming decision scene oriented agent of claim 1, wherein the autonomous evolution unit is configured to quantitatively score each step of action in the decision trajectory data using a discount gain function; the calculation formula of the discount gain function is as follows: Wherein, the Is a set of metrics; The index value is indicated as such, Represents a measurement index, t represents time, The metric step size is represented as such, Representing the discount factor(s), The observation at the time of t is indicated, The observation at time t+k is shown.
7. The complex game decision scene oriented agent of claim 1 wherein the autonomous evolution unit employs a full parameter fine tuning approach, the optimizer selects AdamW and the learning rate is set to 5e −5 .
8. A complex game decision scene-oriented agent construction method is characterized by comprising the following steps: the state adapter extracts original state information from a complex game decision environment and converts the original state information into a structured text observation which can be understood by a large language model; outputting a high-level natural language strategy instruction by the planner according to the structured text observation; the planning verifier carries out compliance verification on the high-level natural language strategic instruction according to a preset rule base, and if the verification is not passed, the planning verifier is driven to carry out iterative correction on the high-level natural language strategic instruction; the executor generates standardized executable actions according to the verified high-level natural language strategic instructions; The execution verifier performs validity verification on the standardized executable action, and if the verification is not passed, the execution verifier is driven to perform iterative correction on the standardized executable action; The action adapter converts the standardized executable actions passing the verification into operation instructions which can be identified by the complex game decision environment, and executes the operation instructions in the complex game decision environment; and the autonomous evolution unit quantitatively evaluates standardized executable actions passing verification, screens out high-scoring data, and performs supervision fine adjustment on the planner, the planning verifier, the executor and the execution verifier according to the high-scoring data.
9. The complex gaming decision scene oriented agent construction method of claim 8, wherein said converting said raw state information into structured text observations understandable to a large language model comprises: sequencing units in the complex game decision environment according to the space position by adopting a proximity sequencing algorithm, and aggregating units for executing similar tasks into a single description by adopting a unit aggregation algorithm; The planner outputs high-level natural language strategy instructions according to the structured text observation, and the planner comprises the following steps: The planner adopts a structured prompting template embedded with game rules and expert knowledge to generate the high-level natural language strategy instruction.
10. The complex game decision scene oriented agent construction method of claim 8, wherein the autonomous evolution unit is configured to quantitatively score each step of action in the decision trajectory data using a discount gain function; the calculation formula of the discount gain function is as follows: Wherein, the Is a set of metrics; The index value is indicated as such, Represents a measurement index, t represents time, The metric step size is represented as such, Representing the discount factor(s), The observation at the time of t is indicated, The observation at time t+k is shown.

Description

Complex game decision scene-oriented intelligent body and construction method thereof Technical Field The invention relates to the technical field of artificial intelligence, in particular to an intelligent agent for a complex game decision scene and a construction method thereof. Background The game decision is a core technology for processing the mutual influence and competition relationship among multiple decision bodies, provides cooperation and competition decision support for the intelligent bodies by constructing a mathematical model comprising participants, strategy space and income functions, and is widely applied to the fields of multi-intelligent body systems, resource allocation and the like. Along with the development of a Large Language Model (LLM), a large language model Agent (LLM-Agent) fuses language understanding generalization and reinforcement learning target driving characteristics, becomes a key carrier of complex task decision, and realizes autonomous decision through perception-thinking-action closed loop. However, the existing complex game decision-making agent has a plurality of technical defects that firstly, a hierarchical decision mechanism is lacked, a high-level strategy and a low-level strategy are mixed into a whole, so that the reasoning burden is heavy, the strategy is disordered, global planning and local operation are difficult to achieve, secondly, the self-correction and self-evolution capability is not available, static training data or a fixed prompt template is relied on, continuous learning and improvement from an execution result cannot be achieved, thirdly, the quality of the training data is uncontrollable, an effective sample screening mechanism is lacked, the high-value sample is low in proportion, so that the training efficiency and generalization capability are poor, fourthly, the decision consistency and the interpretability are poor, the strategy and the strategy are not explicitly decoupled, the decision drift easily occurs, and the verification module support rationality assessment is lacked. In the prior art, the COA-GPT generates a military operation scheme through a large language model, but the action space is simplified and does not relate to a complex strategy, textStarCraftII builds a text game environment, but the interface design is non-standardized and highly coupled with a specific method, and the action space is constrained, swarmBrain adopts a layered architecture, but the design of a bottom layer state machine is complex, and limits the decision space and the cross-scene migration capability. None of these techniques address the core deficiencies described above, and it is difficult to meet the high requirements of complex gaming scenarios. Disclosure of Invention The invention provides an agent oriented to a complex game decision scene and a construction method thereof, and aims to solve the problem that the existing large language model agent has weak reasoning capacity in the complex game decision scene and improve the performance of the agent. The intelligent agent for the complex game decision scene comprises a state adapter, an inference decision unit, an action adapter and an autonomous evolution unit, wherein the inference decision unit comprises a planner, a planning verifier, an executor and an execution verifier; The state adapter is used for extracting original state information from a complex game decision environment, converting the original state information into structured text observation which can be understood by a large language model and transmitting the structured text observation to the planner; The planner is a large language model and is used for outputting a high-level natural language strategy instruction to the planning verifier according to the input structured text observation; the system comprises a planning verifier, an execution verifier, an action adapter and an autonomous evolution unit, wherein the planning verifier is used for carrying out compliance verification on an input high-level natural language strategic instruction according to a preset rule base, driving the planner to carry out iterative correction on the high-level natural language strategic instruction if the verification is not passed, and inputting the high-level natural language strategic instruction passing the verification into the executor; the action adapter is used for converting the input standardized executable actions into operation instructions which can be identified by the complex game decision environment and executing the operation instructions in the complex game decision environment; The autonomous evolution unit is used for quantitatively evaluating the input standardized executable actions, screening out high-scoring data, and performing supervision fine tuning on the reasoning decision unit according to the high-scoring data. According to some embodiments of the invention, the state adapter is configured to rank units i