CN-121973183-A - Control method of intelligent agent, controller, intelligent agent and storage medium

CN121973183ACN 121973183 ACN121973183 ACN 121973183ACN-121973183-A

Abstract

The embodiment of the application discloses a control method of an intelligent agent, a controller, the intelligent agent and a storage medium. Inputting a current system state and a current control input quantity into a predictive control model to obtain an action description sequence of an intelligent body for implementing a plurality of actions in future time, obtaining the actual contact force of the intelligent body for contacting a target object, generating a contact force cost item based on the difference value between the contact force sequence and the actual contact force, and executing optimization operation based on the contact force cost item to obtain a target control input sequence. According to the embodiment of the application, the actual contact force of the intelligent body contacting the target object is used as feedback, the actual contact force and the contact force sequence predicted by the predictive control model are expressed in a cost manner, the difference between the contact force sequence predicted by the predictive control model and the actual contact force is embodied through the contact force cost term, and finally the contact force cost term is optimized, so that the obtained target control input sequence can make up for the defect of the contact force sequence predicted by the predictive control model.

Inventors

Request for anonymity
Request for anonymity
Request for anonymity

Assignees

帕西尼感知科技(深圳)有限公司

Dates

Publication Date: 20260505
Application Date: 20251224

Claims (10)

1. The control method of the intelligent agent is applied to the intelligent agent and is characterized by comprising the following steps: Acquiring a current system state, a current control input quantity and a preset predictive control model of the intelligent body, wherein the predictive control model is a control model related to the system state, the control input quantity and the contact force of the intelligent body, and the predictive control model can be used for reducing the contact force; Inputting the current system state and the current control input quantity into the predictive control model to obtain an action description sequence of the intelligent body for implementing a plurality of actions in future time, wherein the action description sequence comprises a contact force sequence; acquiring the actual contact force of the intelligent body contacting the target object; generating a contact force cost term based on a difference between the contact force sequence and the actual contact force; and executing optimization operation based on the contact force cost item to obtain a target control input sequence for the intelligent agent to execute a plurality of actions in future time.
2. The control method of claim 1, wherein the contact force sequence includes a plurality of predicted contact forces, the generating a contact force cost term based on a difference between the contact force sequence and the actual contact force comprising: Acquiring a first difference value of each predicted contact force and the actual contact force; determining a secondary weighted norm corresponding to the predicted contact force based on the first difference; a contact force cost term is generated based on the quadratic weighted norms of all predicted contact forces.
3. The control method according to claim 1 or 2, wherein the performing an optimization operation based on the contact force cost term results in a target control input sequence for the agent to perform a plurality of actions in a future time, comprising: Obtaining constraint cost items, wherein the constraint cost items are used for constraining actions implemented by the intelligent agent in future time; and executing optimization operation based on the constraint cost item and the contact force cost item to obtain a target control input sequence for implementing a plurality of actions by the intelligent body in future time.
4. A control method according to claim 3, wherein the action description sequence further comprises a system state sequence, the constraint cost term comprises a task cost term and a control amount penalty term, and the obtaining the constraint cost term comprises: acquiring a preset reference state sequence; generating a task cost item based on the system state sequence and the reference state sequence; constructing a to-be-solved control input sequence; And generating a control quantity penalty term based on the to-be-solved control input sequence.
5. The control method of claim 4, wherein the system state sequence comprises a plurality of predicted system states, the reference state sequence comprises a plurality of reference system states, one of the predicted system states corresponds to one of the reference system states, the generating task cost items based on the system state sequence and the reference state sequence comprises: acquiring a second difference value between the predicted system state and a corresponding reference system state; determining a secondary weighted norm corresponding to the predicted system state based on the second difference; And generating task cost items based on the secondary weighted norms corresponding to all the prediction system states.
6. The control method according to claim 4, wherein the sequence of pending control inputs includes a plurality of pending control inputs, the generating a control quantity penalty term based on the sequence of pending control inputs includes: generating a secondary weighted norm about the to-be-solved control input quantity based on the to-be-solved control input quantity; accumulating the secondary weighted norms of all the control input quantities to obtain a first total weighted norms; Calculating a third difference value of two adjacent control input quantities; Determining a secondary weighting norm of the to-be-solved control input quantity based on the third difference value; accumulating the secondary weighted norms corresponding to all the control input quantities to be solved to obtain a second total weighted norms; A control amount penalty term is generated based on the first total weighted norm and the second total weighted norm.
7. A control method according to claim 3, wherein the constraint cost term includes a task cost term and a control amount penalty term, and the performing an optimization operation based on the constraint cost term and the contact force cost term obtains a target control input sequence for the agent to perform a plurality of actions in a future time, including: multiplying the control quantity penalty term by a preset first weight coefficient to obtain a weighted control quantity penalty term; Multiplying the contact force cost item by a preset second weight coefficient to obtain a weighted contact force cost item; adding the task cost item, the weighted control quantity penalty item and the weighted contact force cost item to obtain a composite cost; And carrying out optimization operation on the composite cost based on a preset optimizer to obtain a target control input sequence for implementing a plurality of actions by the intelligent agent in future time.
8. A controller comprising a memory and a processor, the memory being connected to the processor, the processor being configured to execute one or more computer programs stored in the memory, the processor, when executing the one or more computer programs, causing the controller to implement the method of controlling an agent according to any one of claims 1-7.
9. An intelligent agent is characterized in that, comprising a controller as claimed in claim 8.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of controlling an agent according to any one of claims 1-7.

Description

Control method of intelligent agent, controller, intelligent agent and storage medium Technical Field The embodiment of the application relates to the technical field of intelligent agents, in particular to a control method of an intelligent agent, a controller, the intelligent agent and a storage medium. Background The agent needs to contact the object in performing the task, e.g., the agent needs to hold the object. In order to ensure that the object can be contacted smoothly and effectively, the related art adopts a non-complementary model to calculate the contact force between the object and the object, but unavoidable deviation exists in the contact force output by the non-complementary model in the real world, for example, parameters such as the actual rigidity, friction coefficient and the like of the surface of the object are unknown to the object, so that a large difference is generated between the contact force output by the non-complementary model and the actually occurring contact force, and the difference easily causes the task execution failure of the object. Disclosure of Invention An object of an embodiment of the present application is to provide a control method of an agent, a controller, an agent, and a storage medium, which improve the situation that the related art easily causes the agent to fail to perform a task due to deviation of a predicted contact force from an actual contact force. In a first aspect, an embodiment of the present application provides a control method of an agent, which is applied to the agent, and includes obtaining a current system state, a current control input amount, and a preset predictive control model of the agent, where the predictive control model is a control model related to the system state, the control input amount, and a contact force of the agent, and the predictive control model is capable of making the contact force tiny, inputting the current system state and the current control input amount into the predictive control model to obtain an action description sequence of the agent for implementing a plurality of actions in a future time, where the action description sequence includes a contact force sequence, obtaining an actual contact force of the agent contacting a target object, generating a contact force cost item based on a difference value between the contact force sequence and the actual contact force, and performing an optimization operation based on the contact force cost item to obtain a target control input sequence of the agent for implementing a plurality of actions in the future time. Optionally, the contact force sequence comprises a plurality of predicted contact forces, and the generation of the contact force cost term based on the difference value between the contact force sequence and the actual contact force comprises the steps of obtaining a first difference value between each predicted contact force and the actual contact force, determining a secondary weighting norm corresponding to the predicted contact force based on the first difference value, and generating the contact force cost term based on the secondary weighting norms of all the predicted contact forces. Optionally, the performing an optimization operation based on the contact force cost term to obtain a target control input sequence of the agent for implementing a plurality of actions in a future time includes obtaining a constraint cost term, where the constraint cost term is used for constraining actions implemented by the agent in the future time, and performing an optimization operation based on the constraint cost term and the contact force cost term to obtain a target control input sequence of the agent for implementing a plurality of actions in the future time. Optionally, the action description sequence further comprises a system state sequence, the constraint cost item comprises a task cost item and a control quantity penalty item, the constraint cost item obtaining comprises obtaining a preset reference state sequence, generating the task cost item based on the system state sequence and the reference state sequence, constructing a to-be-solved control input sequence, and generating the control quantity penalty item based on the to-be-solved control input sequence. Optionally, the system state sequence includes a plurality of prediction system states, the reference state sequence includes a plurality of reference system states, one prediction system state corresponds to one reference system state, and the task cost item generating based on the system state sequence and the reference state sequence includes obtaining a second difference value between the prediction system state and the corresponding reference system state, determining a secondary weighting norm corresponding to the prediction system state based on the second difference value, and generating the task cost item based on the secondary weighting norms corresponding to all the prediction system s