CN-122008238-A - Humanoid robot control method and system based on reinforcement learning

CN122008238ACN 122008238 ACN122008238 ACN 122008238ACN-122008238-A

Abstract

The invention provides a humanoid robot control method and a humanoid robot control system based on reinforcement learning, wherein the method comprises the steps of planning a humanoid robot path by adopting a robot control model according to the current position, stacking task and residual energy of the humanoid robot to obtain an optimal path, and executing control processing of the humanoid robot in a target stacking area according to the optimal path, wherein the robot control model is obtained by training through a GNN (global navigation network), reinforcement learning and Monte Carlo tree searching method. The invention has the beneficial effects of improving the carrying efficiency of the humanoid robot, reducing the energy consumption and enhancing the flexibility of path planning.

Inventors

QI GUANGTU
ZHOU CHUNXIANG
CHEN DUNJIAN

Assignees

华盛控智能科技(广东)有限公司

Dates

Publication Date: 20260512
Application Date: 20260401

Claims (8)

1. The humanoid robot control method based on reinforcement learning is characterized by comprising the following steps: Planning a path of the humanoid robot by adopting a robot control model according to the current position, the stacking task and the residual energy of the humanoid robot to obtain an optimal path, and executing control processing of the humanoid robot in a target stacking area according to the optimal path; The robot control model is obtained through the following steps: Determining carrying constraint of a robot according to the height constraint of the classified stacking points, the task completion time constraint and the path conflict constraint, and determining an objective function according to the carrying constraint; acquiring training data, constructing node characteristics and edge characteristics of the GNN graph neural network through the training data, and carrying out graph convolution processing on the node characteristics and the edge characteristics through the GNN graph neural network to obtain embedded representation; performing reinforcement learning training according to the embedded representation and the objective function to obtain a first optimal prediction path; Carrying out optimal carrying path reasoning on the first predicted path by adopting a Monte Carlo tree searching method to obtain a second optimal predicted path; And executing training for preset times to obtain the robot control model.
2. The reinforcement learning-based humanoid robot control method of claim 1, wherein determining a handling constraint of a robot according to a height constraint of a classified stocker point, a task completion time constraint, and a path collision constraint, determining an objective function according to the handling constraint, comprises: The height constraint is the maximum allowable stacking height of the classified stacking points, the time constraint is the maximum allowable carrying time of the material stack to be carried, and the path conflict constraint is the minimum allowable safety distance between the humanoid robots; determining an objective function according to the conveying constraint, wherein the objective function is the minimum energy consumption and minimum conveying time for conveying and classifying and placing the material piles to be conveyed by the humanoid robot, and the objective function is the objective function Expressed as: Wherein, the , , And For the energy consumption coefficient during the movement, To reach the material pile to be carried from the current position for the humanoid robot, In order to be the path length of the humanoid robot from the material pile to be conveyed to the sorting stacking point, In order for the path to be energy-consuming, As the weight coefficient of the energy consumption, As the time-weighting factor is used, Is the power of the humanoid robot, The energy consumption of the material pile to be carried is grasped for the humanoid robot, The energy consumption for lifting materials is realized.
3. The reinforcement learning-based humanoid robot control method of claim 2, wherein the acquiring training data, constructing node features and edge features of the GNN graph neural network by the training data, performing graph convolution processing on the node features and the edge features by the GNN graph neural network to obtain embedded representations, includes: the system comprises a training data sample, a storage data sample and a storage data sample, wherein the training data sample comprises a carrying task graph, the carrying task graph comprises nodes and edges, the nodes comprise node characteristics, the edges comprise edge characteristics, the node characteristics comprise coordinates of a material stack to be carried, the maximum allowable carrying time of the material stack to be carried, the type of a classified stacking point, the coordinates of the classified stacking point, the height constraint and the current stacking height; Performing graph convolution processing on a plurality of graph convolution layers of the GNN graph neural network to obtain an embedded representation, wherein the graph convolution processing comprises: Wherein, the And Is the first Node representation and the first layer The node representation of the layer is such that, Adjacency matrix for ReLU activation function Representing the weights of the edges between the nodes, Is a trainable weight matrix.
4. The reinforcement learning based humanoid robot control method of claim 3, wherein the training data includes: and constructing a carrying task diagram by adopting at least one of digital simulation, historical task data and logistics simulation data, and determining the training data through the carrying task diagram.
5. The reinforcement learning based humanoid robot control method of claim 3, wherein the reinforcement learning training according to the embedded representation and the objective function, to obtain a first optimal predicted path, includes: selecting adjacent nodes or edges by adopting a greedy strategy by taking the embedded representation as an environmental state; calculating a reward value through the objective function and the objective constraint, updating a Q value or a strategy by adopting TD learning according to the reward value, and repeatedly executing reinforcement learning training to obtain a first optimal prediction path, wherein the Q value represents a value of an action selected by the humanoid robot at each node, and the actions comprise material grabbing, material lifting and carrying and running.
6. The reinforcement learning-based humanoid robot control method of claim 5, wherein the performing optimal handling path reasoning on the first predicted path by using a monte carlo tree search method to obtain a second optimal predicted path includes: Acquiring the environmental state and action of each node of the first prediction path, and calculating the maximum upper limit confidence of the child node of each node by adopting a Monte Carlo tree searching method; Determining optimal sub-nodes according to the maximum upper limit confidence, expanding the nodes according to the optimal sub-nodes, and generating candidate prediction paths in a random mode according to the expanded optimal sub-nodes; The statistical information calculated by the target function and the target constraint for the candidate predicted path, wherein the statistical information comprises a total rewarding value and access times of each node in the candidate predicted path; returning the reward estimation along the candidate prediction path, and updating the reward estimation of the Monte Carlo tree; updating the Q value of the node in the first optimal predicted path according to the reward estimation to obtain a second optimal predicted path, wherein the Q value updating mode of the node is as follows: Wherein, the Is a node The Q value after the update is set to be equal to, For the total prize value of a node at each simulation, Is the number of accesses to the point.
7. The reinforcement learning based humanoid robot control method of claim 1, further comprising: And acquiring real-time states of the humanoid robot, the material pile to be conveyed and the classified stacking point, and updating the optimal conveying path of the humanoid robot in the target stacking area through a robot control model if the real-time states comprise that the humanoid robot cannot finish one-time conveying of the current energy, the stacking height of the material pile to be conveyed and any one of the stacking height of the classified stacking point is changed.
8. A reinforcement learning-based humanoid robot control system, comprising: The first module is used for planning the path of the humanoid robot by adopting a robot control model according to the current position, the stacking task and the residual energy of the humanoid robot to obtain an optimal path, and executing the control processing of the humanoid robot in the target stacking area according to the optimal path; the robot control model is obtained through the following modules: the second module is used for determining the carrying constraint of the robot according to the height constraint of the classified stacking points, the task completion time constraint and the path conflict constraint, and determining an objective function according to the carrying constraint; The third module is used for acquiring training data, constructing node characteristics and edge characteristics of the GNN graph neural network through the training data, and obtaining embedded representation after graph convolution processing is carried out on the node characteristics and the edge characteristics through the GNN graph neural network; A fourth module, configured to perform reinforcement learning training according to the embedded representation and the objective function, to obtain a first optimal prediction path; a fifth module, configured to perform optimal handling path reasoning on the first predicted path by using a monte carlo tree search method, so as to obtain a second optimal predicted path; and a sixth module, configured to perform training for a preset number of times to obtain the robot control model.

Description

Humanoid robot control method and system based on reinforcement learning Technical Field The invention relates to the technical field of robots, in particular to a humanoid robot control method and system based on reinforcement learning. Background In modern logistics and warehouse management, humanoid robots play an increasingly important role. In one scene, when a plurality of palletizing robots classify materials to be palletized, the carrying behavior mainly comprises the grabbing action of the materials to be palletized, the path movement and the placement of stacking points at the target palletizing position of the objects, the optimal path and the optimal placement action are predicted mainly through simulation, reinforcement learning and other modes, however, the prior art has the defects that the reinforcement learning has high calculation complexity when the state space and the action space are large, the instantaneity is insufficient, especially in a dynamic environment, the training time is long, the reinforcement learning usually needs a large amount of training data and time to converge, especially in a complex task, the deployment period is prolonged, the sensitivity to the environmental change is possibly optimized for a specific environment when the reinforcement learning model is trained, the experience acquired by a single robot through an experience pool needs to be accumulated again if the environment is changed, the suitability is poor, and the placement height of different materials to be classified and the stacking points are not considered, so that the humanoid robot has an additional energy effect. Disclosure of Invention The invention aims to at least solve one of the technical problems in the prior art, and provides a humanoid robot control method and a humanoid robot control system based on reinforcement learning, which improve the carrying efficiency of the humanoid robot and reduce the carrying energy consumption. One aspect of the present invention provides a reinforcement learning-based humanoid robot control method, including: Planning a path of the humanoid robot by adopting a robot control model according to the current position, the stacking task and the residual energy of the humanoid robot to obtain an optimal path, and executing control processing of the humanoid robot in a target stacking area according to the optimal path; The robot control model is obtained through the following steps: Determining carrying constraint of a robot according to the height constraint of the classified stacking points, the task completion time constraint and the path conflict constraint, and determining an objective function according to the carrying constraint; acquiring training data, constructing node characteristics and edge characteristics of the GNN graph neural network through the training data, and carrying out graph convolution processing on the node characteristics and the edge characteristics through the GNN graph neural network to obtain embedded representation; performing reinforcement learning training according to the embedded representation and the objective function to obtain a first optimal prediction path; Carrying out optimal carrying path reasoning on the first predicted path by adopting a Monte Carlo tree searching method to obtain a second optimal predicted path; And executing training for preset times to obtain the robot control model. According to the reinforcement learning-based humanoid robot control method, wherein the handling constraint of the robot is determined according to the height constraint of the classified stacking points, the task completion time constraint and the path conflict constraint, the objective function is determined according to the handling constraint, and the method comprises the following steps: The height constraint is the maximum allowable stacking height of the classified stacking points, the time constraint is the maximum allowable carrying time of the material stack to be carried, and the path conflict constraint is the minimum allowable safety distance between the humanoid robots; determining an objective function according to the conveying constraint, wherein the objective function is the minimum energy consumption and minimum conveying time for conveying and classifying and placing the material piles to be conveyed by the humanoid robot, and the objective function is the objective function Expressed as: Wherein, the ,,AndFor the energy consumption coefficient during the movement,To reach the material pile to be carried from the current position for the humanoid robot,In order to be the path length of the humanoid robot from the material pile to be conveyed to the sorting stacking point,In order for the path to be energy-consuming,As the weight coefficient of the energy consumption,As the time-weighting factor is used,Is the power of the humanoid robot,The energy consumption of the material pile to be carried is grasped for th