CN-121589834-B - Human robot whole body control reinforcement learning method and system based on parallel calculation

CN121589834BCN 121589834 BCN121589834 BCN 121589834BCN-121589834-B

Abstract

The invention discloses a humanoid robot whole body control reinforcement learning method and system based on parallel calculation, the method comprises the steps of constructing CasADi expressions for calculating the space inertia tensor of a joint connecting rod, constructing CasADi symbol expressions of centroid positions and centroid momentums, converting the expressions and the symbol expressions into CUDA kernels and compiling the CUDA kernels into executable files, importing the CUDA kernels of CasADi expressions and symbol expressions and opening up a GPU memory space when training environments are initialized, calling CasADi the expressions after randomization of the mass and centroid domains of the connecting rod, parallelly calculating the space inertia tensor of the joint of each environment, calling CasADi symbol expressions after each simulation step, parallelly calculating the centroid positions and the centroid momentums of each environment, calculating a reward function by combining the centroid positions and the centroid momentums, and introducing the reward function into reinforcement learning training to obtain a robot whole body control strategy.

Inventors

Qing Ziyi
HAN ZHIMIN
LIAN WENKANG
DENG CHUANQI

Assignees

杭州电子科技大学

Dates

Publication Date: 20260508
Application Date: 20260130

Claims (7)

1. The human-shaped robot whole body control reinforcement learning method based on parallel calculation is characterized by comprising the following operation steps of: Constructing CasADi expression for calculating the space inertia tensor of the joint connecting rod; constructing CasADi symbol expressions of centroid positions and centroid momentums; Converting CasADi expressions and CasADi symbol expressions into CUDA kernels and compiling the CUDA kernels into executable files; Introducing a CasADi expression and a CUDA kernel of a CasADi symbol expression and opening up a GPU memory space when a training environment is initialized; calling CasADi expressions after randomizing the mass and centroid domains of the connecting rod to calculate the joint space inertia tensor of each environment in parallel; Invoking CasADi symbol expressions after each simulation step to calculate the centroid position and centroid momentum of each environment in parallel; Calculating a reward function by combining the centroid position and the centroid momentum; The reward function comprises a line momentum tracking reward constructed in a walking strategy, an angular momentum reward and a line momentum change rate tracking penalty constructed in a standing strategy; Introducing a reward function into reinforcement learning training to obtain a robot whole body control strategy; constructing CasADi expressions that calculate the joint link spatial inertia tensor includes: in a general robot description format of the robot, reading an inertial tensor of the robot at the centroid of each connecting rod in an initial zero position; creates a symbolic variable representing the new centroid position and the total mass of the new connecting rod, Calculating an inertial tensor at the new centroid position; defining the coincidence of the joint and the connecting rod coordinate to obtain the inertial tensor of the connecting rod under the joint coordinate and the space inertial tensor of the joint under the local coordinate system; finally, constructing CasADi expressions by utilizing the new mass center position of each connecting rod, the new total mass of each connecting rod, the space inertia tensor of each joint and the total mass of the robot; Constructing CasADi sign expressions for centroid position and centroid momentum includes: In a general robot description format of the robot, reading the spatial transformation of each joint of the robot in an initial zero position, wherein the spatial transformation is transformed into a parent joint in a child joint system; Updating the spatial transformation of each joint by initializing the rigid body composite inertia tensor of each joint; then, recursively updating the rigid body composite inertia tensor through space transformation to obtain the rigid body composite inertia tensor at the centroid; Enabling the centroid to realize space transformation under the world system based on the centroid position, and further obtaining transformation of the centroid relative to each joint through recursion; finally, constructing CasADi symbol expressions by utilizing the position of the mass center under the world coordinate system, the mass center momentum, the generalized position comprising the linear displacement and quaternion of the robot body and the motor angle of each joint, the generalized speed comprising the linear speed and the angular speed of the robot body and the angular speed of each joint motor, and the space inertia tensor of each joint; converting into CUDA kernel and compiling into executable file includes utilizing CasADi computing process structure formed by three portions of input memory, intermediate memory and output memory: In the calculation process of CasADi, the input memory is accessed to the input given by the user, the input value is stored in the intermediate memory, the intermediate memory executes operation according to the memory address corresponding to the symbol operation, the operation result is stored in the intermediate memory again, and after all symbol operations are executed, the result stored in the intermediate memory is stored in the output memory, and the user obtains the calculation result through the output memory; Converting the computation process of CasADi expression and CasADi symbol expression into CUDA kernel corresponding to each thread address of CUDA, compiling to obtain executable file by Cmake, pre-distributing GPU memory according to parallel environment in IsaacGym/IsaacLab frame and calling executable file of CUDA kernel to compute, implementing parallelization of joint space inertia tensor, centroid position and centroid momentum computation, utilizing multiple threads of GPU to compute multiple CUDA kernels, and obtaining computation results of multiple environments.
2. The parallel computing-based humanoid robot whole-body control reinforcement learning method of claim 1, wherein importing CUDA kernels of CasADi expressions and CasADi symbol expressions and opening up GPU memory space comprises: During initialization, casADi expressions and CasADi symbol expressions are loaded, the ID of each thread is obtained according to the number of parallel environments, and the memory space of the GPU is pre-allocated according to the size of the working space required during calculation of a single CasADi symbol expression.
3. The method for strengthening learning of the whole body control of the humanoid robot based on parallel computing of claim 2, wherein the parallel computing of the joint space inertia tensor of each environment comprises: Reading attributes of rigid body including rigid body mass and mass center position in IsaacGym/IsaacLab frames under each environment, and randomizing mass and mass center fields; for each environment, splicing the mass centers of each rigid body to construct a new mass center position of each connecting rod, and splicing the mass of each rigid body to construct a new total mass of each connecting rod; And finally, inputting the new centroid position of each connecting rod and the new total mass of each connecting rod of each environment into a CUDA kernel of a CasADi expression, and outputting the spatial inertia tensor of the rigid body and the total mass of the robot after domain randomization.
4. The method for strengthening learning the whole body control of the humanoid robot based on parallel computing according to claim 3, wherein the parallel computing of the centroid position and the centroid momentum of each environment comprises: And when each simulation step is carried out, inputting the generalized position of each environment including the linear displacement and quaternion of the robot body and the motor angle of each joint, the generalized speed including the linear speed and the angular speed of the robot body and the angular speed of each joint motor, and the spatial inertia tensor of each joint into a CUDA (compute unified device architecture) kernel of a CasADi symbol expression to obtain the centroid position and centroid momentum of each environment, and obtaining the centroid speed through the linear momentum and the total mass of the robot.
5. The human-shaped robot whole-body control reinforcement learning system based on parallel computing is characterized in that the human-shaped robot whole-body control reinforcement learning method based on parallel computing, which is disclosed in any one of claims 1 to 4, is executed, and comprises the following steps: The joint space inertia tensor calculation module is used for calculating the space inertia tensor of the joint and the total mass of the robot after randomization of the mass center position and the mass domain of the connecting rod; The mass center position and mass center momentum calculating module is used for calculating the mass center position and mass center momentum of the robot; the CUDA kernel module is used for converting symbol expressions of the joint space inertia tensor, the centroid position and the centroid momentum CasADi into a CUDA kernel, opening up a GPU memory space and realizing parallel calculation of the joint space inertia tensor, the centroid position and the centroid momentum; The rewarding module is used for calculating a rewarding function and punishment of the robot based on the mass center momentum; And the reinforcement learning module is used for using the rewards and punishments calculated by the rewarding module for reinforcement learning training of the robot to obtain a whole body control strategy of the robot.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and movable on the processor, wherein the processor implements the parallel calculation based human robot whole body control reinforcement learning method as claimed in any one of claims 1 to 4 when executing the program.
7. A storage medium, on which a computer program is stored, characterized in that the computer program, when being run by a processor, implements the parallel calculation based human robot whole body control reinforcement learning method as claimed in any one of claims 1-4.

Description

Human robot whole body control reinforcement learning method and system based on parallel calculation Technical Field The invention relates to the technical field of robot control, in particular to a human-shaped robot whole body control reinforcement learning method and system based on parallel calculation, which are applied to reinforcement learning whole body control of a human-shaped robot. Background The application of reinforcement learning (Reinforcement Learning, RL) in robot control can effectively avoid the limitations of inaccurate modeling, difficulty in coping with dynamic changes of complex environments and the like encountered by the traditional model driving control method in a high-dimensional system. By introducing reinforcement learning into a robot control system, researchers can construct a more flexible, efficient and robust control framework, so that the robot has stronger environment understanding capability and adaptability, the reinforcement learning becomes an important direction of modern robot control gradually, and a foundation is laid for deploying autonomous robots in complex real scenes. However, in practical robot whole body control applications, precise adjustment of the mass center motion and momentum distribution of the robot whole body in a high-dimensional space is often required. However, in the conventional reinforcement learning control framework, rewards or penalties are mostly applied to information such as the position, the speed and the like of the connecting rod of the robot, and compact dynamics description such as centroid momentum is lacking, so that coordinated control of the whole body is difficult to achieve. Secondly, the robot reinforcement learning platform IsaacGym, isaacLab and the like of the current main stream do not provide an interface for acquiring the mass center momentum of the rigid body, and although the mass center momentum can be calculated by calling the dynamics library such as Pinocchio, RBDL and the like of the current main stream, the dynamics library such as Pinocchio and the like runs on a CPU, and the reinforcement learning training runs on a GPU, so that the CPU and the GPU consume a great deal of time in data interaction, and the training time is greatly increased. Disclosure of Invention The invention aims to provide a humanoid robot whole-body control reinforcement learning method and system based on parallel calculation, so as to solve the problems in the background technology. The invention provides a humanoid robot whole body control reinforcement learning method based on parallel calculation, which comprises the following operation steps: And 1, constructing CasADi expression for calculating the space inertia tensor of the joint connecting rod. Preferably, in a general robot description format of the robot, the inertial tensor of each connecting rod centroid of the robot at the initial zero position is read, a symbol variable representing the new centroid position and the total mass of the new connecting rod are created, the inertial tensor at the new centroid position is calculated, the joints are defined to coincide with the connecting rod coordinates to obtain the inertial tensor of the connecting rod at the joint coordinates and the spatial inertial tensor of the joints at the local coordinate system, and the expression CasADi is constructed by utilizing the new centroid position of each connecting rod, the new total mass of each connecting rod, the spatial inertial tensor of each joint and the total mass of the robot。 And 2, constructing CasADi symbol expressions of the centroid position and the centroid momentum. Preferably, in a general robot description format of the robot, the spatial transformation of each joint of the robot in an initial zero position is read, the spatial transformation of each joint is updated by initializing a rigid body composite inertia tensor of each joint, the spatial transformation is updated recursively to obtain a rigid body composite inertia tensor at a centroid, the spatial transformation of the centroid in a world system is realized based on the centroid position, the transformation of the centroid relative to each joint is obtained recursively, a centroid momentum matrix is obtained simultaneously, and a CasADi symbol expression is constructed by utilizing the position of the centroid in the world coordinate system, the centroid momentum, the generalized position including the linear displacement and quaternion of a robot body and the motor angle of each joint, the generalized speed including the linear speed and the angular speed of the robot body and the angular speed of a motor of each joint and the spatial inertia tensor of each joint。 And 3, converting CasADi expressions and CasADi symbol expressions into CUDA kernels and compiling the CUDA kernels into executable files. Preferably, the structure of CasADi calculation process comprising three parts of input memory, intermediate