CN-121973232-A - Human-shaped robot hierarchical skill fusion control method and system based on attention mechanism

CN121973232ACN 121973232 ACN121973232 ACN 121973232ACN-121973232-A

Abstract

A human-shaped robot hierarchical skill fusion control method and system based on an attention mechanism relate to the field of human-shaped robot control. The method solves the problems of poor adaptability, unstable control and high energy consumption of skill fusion weights of an existing SkillBlender method, and comprises the steps of pre-training to obtain basic skill strategies of a plurality of humanoid robots, freezing parameters of all basic skills after training is completed, constructing a high-level controller, collecting state information and task target information of the humanoid robots at the current moment, carrying out feature coding through a state coding module to obtain high-level representation, generating a learnable skill embedding vector for each basic skill through a skill embedding module, generating attention fusion weights corresponding to each basic skill through an attention weight calculation module, generating corresponding sub-targets for each basic skill through a sub-target generation module based on the high-level state representation, inputting the current state and the sub-targets into the corresponding basic skill strategies, and obtaining action output of each basic skill.

Inventors

WANG RONGZHAO
CHEN SONGLIN

Assignees

哈尔滨工业大学

Dates

Publication Date: 20260505
Application Date: 20260327

Claims (10)

1. The human-shaped robot hierarchical skill fusion control method based on the attention mechanism is characterized by comprising the following steps of: Pre-training basic skills to obtain basic skill strategies of a plurality of humanoid robots, and freezing parameters of all basic skills after training is completed; step two, constructing a high-level controller based on parameters of all basic skills after the training in the step one is completed, wherein the high-level controller comprises a state coding module, a skill embedding module, an attention weight calculation module, a sub-target generation module and an action fusion module, and training the high-level controller by adopting a PPO algorithm; step three, acquiring state information and task target information of the humanoid robot at the current moment, and performing feature coding through the state coding module in the step two to obtain high-level state representation; Generating a learnable skill embedding vector for each basic skill through a skill embedding module, and generating an attention fusion weight corresponding to each basic skill through an attention weight calculation module by combining the high-level representation obtained in the step three; Step five, generating a corresponding sub-target for each basic skill based on the high-level representation by the sub-target generation module in the step two, and inputting the current state and the sub-target into a corresponding basic skill strategy to obtain action output of each basic skill; And step six, performing weighted fusion on the action output of each basic skill based on the attention fusion weight through the action fusion module in the step two, and generating a final control instruction of the humanoid robot.
2. The method for controlling the fusion of human-shaped robot level skills based on an attention mechanism according to claim 1, wherein the pre-training in the step one is performed by obtaining a plurality of human-shaped robot basic skill strategies including Walking skills, reaching hand-extending skills, squatting squatting skills and Stepping stepping skills, each of which is performed by using target condition reinforcement learning independent training.
3. The method for controlling the hierarchical skill fusion of the humanoid robot based on the attention mechanism according to claim 1, wherein the method for training the high-level controller by using the PPO algorithm in the second step is that only the parameters of the high-level controller are optimized in the training process, and the parameters of all basic skill strategies are kept frozen.
4. The human-shaped robot hierarchical skill fusion control method based on the attention mechanism, which is disclosed by claim 1, is characterized in that the state coding module in the third step performs feature coding, and the method for obtaining the high-level state representation is that a coding function formed by a multi-layer perceptron is adopted to perform feature extraction on the spliced robot state information and task target information, and the high-level state representation with consistent dimension is output.
5. The human-shaped robot hierarchical skill fusion control method based on the attention mechanism according to claim 1 is characterized in that in the fourth step, attention fusion weights corresponding to basic skills are generated through an attention weight calculation module, namely matching degrees of current high-level representation and skill embedding vectors are calculated through a scaling dot product attention mechanism, and attention fusion weights meeting non-negative sum 1 are obtained after Softmax normalization.
6. The method according to claim 1, wherein the sub-objective generating module configures an independent sub-objective generating network for each basic skill, and all sub-objectives generate a high-level representation output by the network sharing state encoding module as input.
7. The method for controlling hierarchical skill fusion of a humanoid robot based on an attention mechanism according to claim 1, wherein the final control instruction in the step six is a target position instruction of each joint of the humanoid robot, and the final control instruction is converted into a joint moment by a proportional-differential PD controller and then is executed by the robot.
8. A humanoid robot-level skill fusion control system based on an attention mechanism, the system comprising: The pre-training module is used for pre-training the basic skills to obtain basic skill strategies of a plurality of humanoid robots, and freezing parameters of all the basic skills after training is completed; The high-level controller constructing module is used for constructing a high-level controller based on the parameters of all basic skills frozen after the training of the pre-training module is completed, and the high-level controller comprises a state coding module, a skill embedding module, a attention weight calculating module, a sub-target generating module and an action fusion module and trains the high-level controller by adopting a PPO algorithm; The feature coding module is used for collecting state information and task target information of the humanoid robot at the current moment, and performing feature coding through the state coding module in the high-level controller construction module to obtain high-level state representation; the attention fusion weight module is used for generating a learnable skill embedding vector for each basic skill through the skill embedding module, combining the high-level representation obtained in the feature coding module, and generating attention fusion weights corresponding to the basic skills through the attention weight calculation module; The action output module of the basic skills is used for generating corresponding sub-targets for each basic skill based on the high-level representation by the sub-target generation module of the high-level controller construction module, and inputting the current state and the sub-targets into the corresponding basic skill strategies to obtain action output of each basic skill; and the control instruction generation module is used for carrying out weighted fusion on the action output of each basic skill based on the attention fusion weight through the action fusion module of the high-level controller construction module to generate a final control instruction of the humanoid robot.
9. A computer storage medium having stored thereon a computer program, which when executed by a processor performs the method of any of claims 1-7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of any of claims 1-7.

Description

Human-shaped robot hierarchical skill fusion control method and system based on attention mechanism Technical Field The invention relates to the technical field of humanoid robot control and reinforcement learning, in particular to a humanoid robot hierarchical skill fusion control method and system based on an attention mechanism. Background With the development of simulation technology and computing power, reinforcement learning has become a core tool for solving the control problem of a complex robot, and has great potential in the scenes of walking, operation and the like of a humanoid robot. However, the humanoid robot has tens to hundreds of degrees of freedom, dynamics are highly nonlinear and frequently contacted, and the end-to-end reinforcement learning method faces core pain points with low sample efficiency, unstable training and poor strategy generalization. The hierarchical reinforcement learning effectively reduces the strategy learning difficulty by decomposing the complex task into reusable sub-skills, wherein the SkillBlender method realizes the capability of adapting to the new task without retraining the low-level strategy through the linear fusion of the pre-training basic skills. The method has the obvious defects that firstly, static or weak conditional weight distribution is adopted, dynamic control requirements of multi-stage tasks cannot be adapted, secondly, key decision features are difficult to extract from a high-dimensional state by simple weight mapping, feature selection capability is limited, thirdly, weight change lacks structural constraint, high-frequency switching is easy to occur, control is not smooth, energy consumption is increased, and the performance is limited in the complex multi-stage humanoid robot tasks. Disclosure of Invention The method aims to solve the problems of poor adaptability, unstable control, high energy consumption and the like of the skill fusion weight in the SkillBlender method in the prior art. Therefore, the invention provides a hierarchical skill fusion control method based on an attention mechanism, which aims to realize state self-adaptive dynamic skill fusion and remarkably improve the success rate, stability and energy efficiency of the human-shaped robot multi-task control on the premise of not retraining basic skills. The invention is realized by the following technical scheme for solving the technical problems: the invention provides a humanoid robot hierarchical skill fusion control method based on an attention mechanism, which comprises the following steps: Pre-training basic skills to obtain basic skill strategies of a plurality of humanoid robots, and freezing parameters of all basic skills after training is completed; step two, constructing a high-level controller based on parameters of all basic skills after the training in the step one is completed, wherein the high-level controller comprises a state coding module, a skill embedding module, an attention weight calculation module, a sub-target generation module and an action fusion module, and training the high-level controller by adopting a PPO algorithm; step three, acquiring state information and task target information of the humanoid robot at the current moment, and performing feature coding through the state coding module in the step two to obtain high-level state representation; Generating a learnable skill embedding vector for each basic skill through a skill embedding module, and generating an attention fusion weight corresponding to each basic skill through an attention weight calculation module by combining the high-level representation obtained in the step three; Step five, generating a corresponding sub-target for each basic skill based on the high-level representation by the sub-target generation module in the step two, and inputting the current state and the sub-target into a corresponding basic skill strategy to obtain action output of each basic skill; And step six, performing weighted fusion on the action output of each basic skill based on the attention fusion weight through the action fusion module in the step two, and generating a final control instruction of the humanoid robot. Further, a preferred embodiment is provided, wherein the pre-training in step one obtains a plurality of humanoid robot basic skill strategies including Walking skills, reaching reach skills, squatting squat skills and Stepping stepping skills, and each basic skill strategy is completed by target condition reinforcement learning independent training. Further, a preferred embodiment is provided, wherein the method for training the high-level controller by using the PPO algorithm in the second step is that only parameters of the high-level controller are optimized in the training process, and parameters of all basic skill strategies remain frozen. Further, a preferred embodiment is provided, wherein the state coding module in the third step performs feature coding, and the method for