CN-121997970-A - Automatic intelligent agent cultivation iteration and emotion cultivation method and system based on AI algorithm

CN121997970ACN 121997970 ACN121997970 ACN 121997970ACN-121997970-A

Abstract

The invention discloses an agent cultivation iteration and emotion automatic cultivation method and system based on an AI algorithm, which are suitable for agent emotion cultivation and comprise the following steps of defining an agent model, wherein the model comprises a strategy network for decision and an emotion state vector for representing an internal state, the agent interacts in the environment, and according to the current environment state and the current emotion state vector, actions are selected and executed through the strategy network to obtain an original task reward of environment feedback. The method has the beneficial effects that the scheme improves emotion from an external and attached performance attribute to a quantifiable, iteratable and learnable core state variable in the intelligent agent for the first time. The limitation of the traditional agent as a pure 'rational optimizer' is broken through, the intrinsic emotion dimension is given to the traditional agent, and a technical foundation is laid for constructing the next generation artificial intelligent entity with the intrinsic motivation and emotion driving behaviors.

Inventors

CHEN YANJUN

Assignees

徐州三米科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260206

Claims (10)

1. The intelligent agent cultivation iteration and emotion automatic cultivation method based on the AI algorithm is suitable for intelligent agent emotion cultivation, and is characterized by comprising the following steps: step S1, defining an agent model, wherein the model comprises a strategy network for decision making and an emotion state vector for representing an intrinsic state; S2, the intelligent agents interact in the environment, and according to the current environment state and the current emotion state vector, actions are selected and executed through the strategy network, so that an original task reward of environment feedback is obtained; step S3, calculating emotion consistency rewards based on consistency of the current event and the current emotion state, and combining the emotion consistency rewards with the original task rewards to obtain comprehensive rewards for training; S4, updating parameters of the strategy network by using a reinforcement learning algorithm with the aim of maximizing the comprehensive rewards so as to optimize the behavior strategy of the intelligent agent; step S5, dynamically updating the emotion state vector according to a preset emotion state transfer model and by combining the current event with an internal target of the intelligent agent; And S6, feeding the updated emotion state vector back to the next decision period, and circularly executing the steps to realize collaborative iteration cultivation of the intelligent agent behavior strategy and the internal emotion state.
2. The method for automatically culturing the artificial intelligence object and emotion according to claim 1, wherein the step S3 comprises: calculating a negative distance between the current emotion state vector and an expected emotion vector output by a pre-trained emotion estimation function for the current event; The calculation formula is as follows: ; Wherein, the For the pre-trained emotion assessment function, In order to be the current state of the environment, In order to select the action to be taken, For the current emotional state vector, for The environmental state at the time t+1; the original task rewards and emotion consistency rewards are linearly combined to form a comprehensive rewards for driving the intelligent agent to comprehensively evolve The formula is as follows: ; Wherein, the The rewards are given for the original task, Is a super parameter greater than 0 for balancing the weight between task completion and emotional rationality.
3. The method for automatically culturing the artificial intelligence object and emotion according to claim 1, wherein the step S4 comprises: collect a section of old policy Generating interaction track data; calculating the dominance function estimate for each time step This value measures the state Execute action downwards The merits with respect to the average level; defining an importance sampling ratio: ; Wherein, the For current environmental state observation Vector of current emotional state The output is the selection of each possible action in a given state Probability distribution of (2); Maximizing desired cumulative discount integrated rewards Updating policy network parameters for the purpose A near-end policy optimization algorithm.
4. The AI algorithm-based agent cultivation iteration and emotion automatic cultivation method as claimed in claim 3, wherein the objective function The method comprises the following steps: ; Wherein, the Is a super-parameter which is used for the processing of the data, The function limits the ratio to And (3) inner part.
5. The AI-algorithm-based agent incubation iteration and emotion automatic incubation method of claim 4, wherein updating of the policy network is maximized by gradient ascent To achieve, at the same time, a cost function network is usually trained To estimate state values and to optimize in combination therewith, the total loss function often contains policy losses, cost function losses, and entropy regularization terms: ; Wherein, the , Is a coefficient of the degree of freedom, Is the policy entropy used to encourage exploration.
6. The method for automatically culturing the artificial intelligence object and emotion according to claim 1, wherein the step S5 comprises: The new emotion state is formed by a weighted combination of an attenuation part of emotion at the previous moment, an emotion xu stimulation part caused by the current event and an active regulation part based on an internal target; The specific calculation formula of the emotion state vector update is as follows: ; Wherein, the And The emotional state vectors at time t and time t +1 respectively, Is the coefficient of emotional inertia, As a function of the emotional stimulus, As a function of the internal mood adjustment, In order to adjust the intensity coefficient of the light, Is an estimate of the long-term target.
7. The method for automatically culturing the artificial intelligence object based on the AI algorithm as set forth in claim 6, wherein, The emotional stimulus function The training target of the neural network model is to enable the output of the function to approach the output of a pretrained emotion estimation function to the same event, wherein the emotion estimation function is used for estimating the matching degree of the event and emotion; Emotion stimulation function Continuous learning during incubation, with the aim of making its output as close as possible to the pre-trained emotion assessment function The output of (a), i.e., minimizing losses: 。
8. The method for automatically culturing the artificial intelligence object based on the AI algorithm as set forth in claim 6, wherein, The internal emotion regulating function The input of (a) includes the current emotional state And long-term target expected value estimated by a cost function The output of which is used to guide the emotional state to adjust in a direction that is advantageous for achieving the long-term goal.
9. The method for automatically culturing the artificial intelligence object and emotion according to claim 1, wherein the step S6 comprises: the updated emotional state As one of the inputs to the next time step policy network; The agent is based on the new state And a new emotion Making the next decision ; In this process, the policy network And emotion model The parameters of the model (a) are continuously optimized, the emotion state is dynamically evolved along with the interaction history, and finally the collaborative culture of skills and emotions is realized.
10. An agent cultivation iteration and emotion automatic cultivation system based on an AI algorithm is characterized by comprising: A definition unit for defining an agent model, the model comprising a policy network for decision making and an emotional state vector for characterizing an intrinsic state; the execution unit is used for enabling the intelligent body to interact in the environment, selecting actions and executing the actions through the strategy network according to the current environment state and the current emotion state vector, and obtaining an original task reward of environment feedback; The combination unit is used for calculating emotion consistency rewards based on the consistency of the current event and the current emotion state, and combining the emotion consistency rewards with the original task rewards to obtain comprehensive rewards for training; the first updating unit is used for updating parameters of the strategy network by using a reinforcement learning algorithm and aiming at maximizing the comprehensive rewards so as to optimize the behavior strategy of the intelligent agent; the second updating unit is used for dynamically updating the emotion state vector according to a preset emotion state transfer model and combining the current event with the internal target of the intelligent body; and the feedback unit is used for feeding back the updated emotion state vector to the next decision period, and circularly executing the steps to realize collaborative iteration cultivation of the intelligent agent behavior strategy and the internal emotion state.

Description

Automatic intelligent agent cultivation iteration and emotion cultivation method and system based on AI algorithm Technical Field The invention relates to the field of artificial intelligence and machine learning, in particular to an agent cultivation iteration and emotion automatic cultivation method and system based on an AI algorithm. Background Along with the development of artificial intelligence technology, the intelligent agent has been widely applied to a plurality of fields such as games, virtual assistants, robot control, automated customer service, etc. Traditional agents are typically designed as rational decision makers that accomplish specific tasks, whose behavioral logic is based on preprogrammed rules or a fixed model trained from a static data set. The intelligent agent has the defects of poor adaptability, lack of emotion dimension, isolated cultivation process and the like. The prior art fails to deeply fuse skill learning and emotion culture to form a closed-loop and cooperative iterative culture framework. Disclosure of Invention The invention aims to solve the technical problems in the prior art and provide an intelligent agent cultivation iteration and emotion automatic cultivation method and system based on an AI algorithm. The application provides an agent cultivation iteration and emotion automatic cultivation method based on an AI algorithm, which is suitable for agent emotion cultivation and comprises the following steps: step S1, defining an agent model, wherein the model comprises a strategy network for decision making and an emotion state vector for representing an intrinsic state; S2, the intelligent agents interact in the environment, and according to the current environment state and the current emotion state vector, actions are selected and executed through the strategy network, so that an original task reward of environment feedback is obtained; step S3, calculating emotion consistency rewards based on consistency of the current event and the current emotion state, and combining the emotion consistency rewards with the original task rewards to obtain comprehensive rewards for training; S4, updating parameters of the strategy network by using a reinforcement learning algorithm with the aim of maximizing the comprehensive rewards so as to optimize the behavior strategy of the intelligent agent; step S5, dynamically updating the emotion state vector according to a preset emotion state transfer model and by combining the current event with an internal target of the intelligent agent; And S6, feeding the updated emotion state vector back to the next decision period, and circularly executing the steps to realize collaborative iteration cultivation of the intelligent agent behavior strategy and the internal emotion state. Preferably, the step S3 includes: calculating a negative distance between the current emotion state vector and an expected emotion vector output by a pre-trained emotion estimation function for the current event; The calculation formula is as follows: ; Wherein, the For the pre-trained emotion assessment function,In order to be the current state of the environment,In order to select the action to be taken,For the current emotional state vector, forThe environmental state at the time t+1; the original task rewards and emotion consistency rewards are linearly combined to form a comprehensive rewards for driving the intelligent agent to comprehensively evolve The formula is as follows: ; Wherein, the The rewards are given for the original task,Is a super parameter greater than 0 for balancing the weight between task completion and emotional rationality. Preferably, the step S4 includes: collect a section of old policy Generating interaction track data; calculating the dominance function estimate for each time step This value measures the stateExecute action downwardsThe merits with respect to the average level; defining an importance sampling ratio: ; Wherein, the For current environmental state observationVector of current emotional stateThe output is the selection of each possible action in a given stateProbability distribution of (2); Maximizing desired cumulative discount integrated rewards Updating policy network parameters for the purposeA near-end policy optimization algorithm. Preferably, the objective functionThe method comprises the following steps: ; Wherein, the Is a small super-parameter which is used to determine the parameter,The function limits the ratio toAnd (3) inner part. Preferably, updating of the policy network is maximized by gradient ramp-upTo achieve, at the same time, a cost function network is usually trainedTo estimate state values and to optimize in combination therewith, the total loss function often contains policy losses, cost function losses, and entropy regularization terms: ; Wherein, the ,Is a coefficient of the degree of freedom,Is the policy entropy used to encourage exploration. Preferably, the step S5 includes: The ne