Search

CN-122021943-A - Unilateral control method and unilateral control device for intelligent agent in random game

CN122021943ACN 122021943 ACN122021943 ACN 122021943ACN-122021943-A

Abstract

The invention discloses a unilateral control method and device of an intelligent agent in random game, and relates to the technical field of artificial intelligence and game theory. The method is used for solving the problem that the conventional random game income control technology cannot support multiple types of control targets. The method comprises the steps of constructing a five-tuple random game model, determining a one-step memory strategy for controlling action decision basis of an agent to be controlled, constructing a single-round game result set in a random game process into a state space of a Markov chain and a transition matrix of the Markov chain, establishing an identity relation among stable distribution of the Markov chain, a state transition rule and a strategy for controlling the agent, determining a control condition of a target environment state long-term expected frequency and a general condition of a zero determinant strategy in the random game by aid of the identity relation, and determining a control condition of a long-term expected benefit of the agent to be controlled.

Inventors

  • WANG ZHEN
  • YUAN ZHENG
  • CHU CHEN
  • LIU CHEN
  • LIU JINZHUO

Assignees

  • 西北工业大学

Dates

Publication Date
20260512
Application Date
20260413

Claims (9)

  1. 1. A unilateral control method of an agent in random game is characterized by comprising the following steps: Constructing a five-tuple random game model comprising a control and controlled agent set, an environment state set, a joint action space, a state transition rule and a benefit rule, and determining a one-step memory strategy for the action decision basis of the control and controlled agent according to the five-tuple random game model, wherein the one-step memory strategy comprises the current environment state, the environment state of the last round and the joint action of the control and controlled agent of the last round; The method comprises the steps of constructing a single-round game result set in a random game process into a state space of a Markov chain, obtaining a transfer matrix of the Markov chain according to a state transfer rule and the one-step memory strategy, wherein the transfer matrix represents the probability of transferring from a current game result to a next round of game result; Determining auxiliary variables through a strategy and a state transition rule of the controlled intelligent agent, determining control conditions of long-term expected frequency of a target environment state according to the relationship between the auxiliary variables and the identity, determining implementation conditions of a zero determinant strategy in random game according to the relationship between the auxiliary variables and the identity, and determining control conditions of long-term expected benefits of the controlled intelligent agent according to the relationship between the auxiliary variables and the identity.
  2. 2. The method of claim 1 wherein the five-tuple random game model is represented by the following formula: the one-step memory strategy of the action decision basis of the controlled and controlled intelligent agents is represented by the following formula: wherein N represents the collection of agents, , The control agent is represented by a control agent, Representing the controlled agent(s) of the agent(s), A set of environmental states is represented, The joint action space is represented by a graph, , The action of the cooperation is indicated, Representing the traitory action(s), A state transition rule is represented and a state transition rule is represented, The rule of benefit is represented as such, Representing the selection cooperation of the controlled agents, Representing the control agent's choice of cooperation, the controlled agent's choice of traitor, Representing traitor selection by the control agents, cooperation selection by the control agents, Representing the traitor selected by the controlling agent, Representing a one-step memory policy vector for controlling the agent, Indicating the current environmental state as And the previous environmental state is The combined action of the control and controlled agent When the intelligent agent selects C, controlling the probability of the intelligent agent selecting C; a one-step memory policy vector representing the controlled agent, Indicating the current environmental state as And the previous environmental state is The combined action of the control and controlled agent The probability of the controlled agent selecting C; a subscript indicating the current environmental state, Subscripts indicating the status of the previous round of environment, Indicating a combination of joint actions of the controlling and controlled agents.
  3. 3. The method of claim 1 wherein the single-turn game outcome set is represented by the following formula: The transition matrix of the Markov chain is represented by the following formula: Wherein, the Representing a collection of single-turn game outcomes, Representing the outcome of the current round of play, , Representing the next round of the target game outcome, , Indicated as the current result is On the condition that the next round of game transitions to state Probability of (2); Representing the result from Transfer to Is a function of the probability of (1), Indicating the conditional action probability of the controlling agent, Indicating the conditional action probability of the controlled agent, A subscript indicating the current environmental state, Indicating the control of the action of the agent, Indicating the action of the controlled agent, The action of the cooperation is indicated, Representing the traitory action(s), Subscripts indicating the next environmental status, Indicating the next round of controlling the actions of the agent, Indicating the next round of controlled agent action.
  4. 4. The method of claim 1, wherein the auxiliary variable is represented by the following formula: the identity relationship is represented by the following formula: Wherein, the Represents the auxiliary variable(s), Representing a state transition rule component, Indicating the next environmental state as And the current environmental state is The combined action of the control and controlled agent When the intelligent agent is in the next round, controlling the probability of selecting C; Indicating the next environmental state as And the current environmental state is The combined action of the control and controlled agent When the intelligent agent is in the next round, controlling the probability of selecting C; a subscript indicating the current environmental state, Indicating a combination of joint actions of the controlling and controlled agents, The action of the cooperation is indicated, Representing the traitory action(s), The index variable is represented by a value of the index, In the representation of the stationary distribution, the environmental state is When the combination of the control agent and the controlled agent is used Is included in the probability component of (a).
  5. 5. The method according to claim 1, wherein the long-term expected benefits of the controlling and controlled agents are calculated by a smooth distribution, in particular: The linear benefit relation between the controlled agent and the controlled agent which can be realized by the zero determinant strategy in the random game is specifically as follows: Wherein, the Indicating a long expected benefit of controlling the agent, Indicating a long expected benefit of the controlled agent, In the representation of the stationary distribution, the environmental state is In the process, the combination action of the control and controlled intelligent bodies Is used for the probability component of (a), Indicating that the control agent is in the environment state Control and controlled agent joint action The return of the next single round, Indicating that the controlled agent is in the environment state Control and controlled agent joint action The return of the next single round, Indicating a combination of joint actions of the controlling and controlled agents, 、 、 The parameters representing the form of the zero determinant strategy control linear gain relation are regulated to take values, so that the control of the linear relation in different forms is realized.
  6. 6. The method of claim 1, wherein the control conditions for the target environmental state long-term desired frequency include a precision frequency control condition and a frequency range control condition; The target environmental state long-term expected frequency is: If the accurate frequency control condition is: if the frequency range control condition is the frequency range control condition: Wherein, the Indicating the target environmental state as Is set to a long-term desired frequency target value, Indicating the target environmental state as A minimum constraint value for the long-term desired frequency, Indicating the target environmental state as The maximum constraint value of the long-term expected frequency, Indicating the joint action of the controlling and controlled agents, Representing the selection cooperation of the controlled agents, wherein the controlled agents select the cooperation; representing control agent selection cooperation, the controlled agent selecting traitor; Representing a control agent selection traitor, the controlled agents selecting a cooperation; representing the control agent's selection of traitors, the controlled agent's selection of traitors; in the representation of the stationary distribution, the environmental state is In the process, the combination action of the control and controlled intelligent bodies Is included in the probability component of (a).
  7. 7. A unilateral control device of an agent in a random game, comprising: The construction unit is used for constructing a five-tuple random game model comprising a control and controlled agent set, an environment state set, a joint action space, a state transition rule and a benefit rule, and determining a one-step memory strategy for the action decision basis of the control and controlled agent according to the five-tuple random game model, wherein the one-step memory strategy comprises the current environment state, the last round of environment state and the joint action of the last round of control and controlled agent; The system comprises an acquisition unit, a state transition matrix, a control unit, a state acquisition unit and a state control unit, wherein the acquisition unit is used for constructing a single-round game result set in a random game process into a state space of a Markov chain, each state included in the state space corresponds to one game result, and the transition matrix of the Markov chain is obtained according to a state transition rule and the one-step memory strategy and represents the probability of transition from a current game result to a next round of game result; The system comprises a determining unit, a determining unit and a control unit, wherein the determining unit is used for determining an auxiliary variable through a strategy and a state transition rule of a controlled intelligent agent, the auxiliary variable and the identity relation are used for determining a control condition of a long-term expected frequency of a target environment state, the auxiliary variable and the identity relation are used for determining an implementation condition of a zero determinant strategy in a random game, and the auxiliary variable and the identity relation are used for determining a control condition of a long-term expected benefit of the controlled intelligent agent.
  8. 8. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the unilateral control method of the agent in the random game of any one of claims 1-6.
  9. 9. A computer readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the unilateral control method of the agent in the random game of any one of claims 1-6.

Description

Unilateral control method and unilateral control device for intelligent agent in random game Technical Field The invention relates to the technical field of artificial intelligence and game theory, in particular to a unilateral control method and device of an intelligent agent in random games. Background In the fields of game theory, artificial intelligence, multi-agent systems and the like, research on an optimal strategy of an agent in long-term interaction has been a core challenge. However, when there is no universal optimal strategy in a general scene, the method has important significance in researching a cooperation strategy in game interaction or a strategy capable of winning a game. The recently developed revenue control theory provides a new research perspective for this purpose. Conventional wisdom holds that in gaming interactions, the revenues of parties are often determined by the joint actions of the participants, and thus unilateral control of gaming revenues by a single agent is often considered impossible. The revenue control theory reveals that a single agent can implement restrictions on the controlled agent revenue through specific policies. For example, a zero-determinant (ZD) strategy in repeated convict diligent games can force to establish a linear relationship between expected benefits of the participating parties, and the control ideas are then generalized to situations such as multiple social diligent games, repeated games with discount factors, alternate games, and multi-channel games. In addition, from the standpoint of controlling the feasible region of the profit, theoretical conditions are also studied, and the control agent can limit the profit combination of the control agent and the controlled agent within a specified range, so that the capability boundary of unilateral profit control is expanded. However, most of the existing studies are based on repeated standard (normal-form) games, and it is difficult to characterize dynamic features that are ubiquitous in real-world interaction environments. In contrast, random gaming can correlate the change of the environmental state with the behavior of an agent, and can reflect the real interaction characteristics in a complex dynamic environment. Therefore, the method and the system not only help to promote the development of the revenue control theory in the dynamic environment, but also have important significance for learning and controlling multiple intelligent agents based on random game. There have been some exploratory studies but there is still a lack of systematic control theory framework. In view of the foregoing, there is a need for an efficient method for achieving revenue and environmental state control in random gaming. The invention realizes the multi-type game control target in the dynamic interaction environment characterized by random games. Specifically, the invention can not only enable the control agent to unilaterally limit the linear relation between the expected benefits of the control agent and the expected benefits of the controlled agent through the zero determinant strategies such as the equalizer strategy, the lux strategy and the like, but also can limit the feasible range of the benefits of the controlled agent through the control strategy, and can also design a one-step memory strategy to unilaterally fix or limit the environmental state distribution. Through the strategy design, the invention theoretically builds a systematic random game control framework and can provide an effective solution for the problems of cooperation and competition in the multi-agent system. Disclosure of Invention The embodiment of the invention provides a unilateral control method and device of an intelligent agent in random game, which are used for solving the problem that the conventional random game income control technology cannot support multiple types of control targets. The embodiment of the invention provides a unilateral control method of an intelligent agent in random game, which comprises the following steps: Constructing a five-tuple random game model comprising a control and controlled agent set, an environment state set, a joint action space, a state transition rule and a benefit rule, and determining a one-step memory strategy for the action decision basis of the control and controlled agent according to the five-tuple random game model, wherein the one-step memory strategy comprises the current environment state, the environment state of the last round and the joint action of the control and controlled agent of the last round; The method comprises the steps of constructing a single-round game result set in a random game process into a state space of a Markov chain, obtaining a transfer matrix of the Markov chain according to a state transfer rule and the one-step memory strategy, wherein the transfer matrix represents the probability of transferring from a current game result to a next round of game resu