CN-121995943-A - Intelligent navigation method for humanoid robot

CN121995943ACN 121995943 ACN121995943 ACN 121995943ACN-121995943-A

Abstract

The invention relates to the technical field of navigation, and particularly discloses an intelligent navigation method of a humanoid robot, which comprises the following steps of S100, receiving a multi-mode input instruction of a user, wherein the multi-mode input instruction comprises a voice instruction or a text instruction; the method comprises the steps of S200, processing the multi-mode input instruction to generate a structured navigation instruction, S300, obtaining environment information through an environment sensing module to construct a three-dimensional semantic map, S400, planning a motion path of the humanoid robot through a path planning module based on the structured navigation instruction and the three-dimensional semantic map to generate a path track, S500, executing motion control through a motion control module according to the path track, converting the track into a control signal of a joint motor of the humanoid robot, and maintaining bipedal balance in real time.

Inventors

HUANG WANZHONG
YANG ZONGFENG
WANG XIANSHI
HUANG CHAOWEI
LUO MINGYANG
ZHOU ZIHONG

Assignees

昆仑之数(成都)科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (9)

1. The intelligent navigation method of the humanoid robot is characterized by comprising the following steps of: Step S100, receiving a multi-mode input instruction of a user, wherein the multi-mode input instruction comprises a voice instruction or a text instruction; step 200, processing the multi-mode input instruction to generate a structured navigation instruction; step S300, acquiring environment information through an environment sensing module, and constructing a three-dimensional semantic map; Step S400, planning a motion path of the humanoid robot through a path planning module based on the structured navigation instruction and the three-dimensional semantic map to generate a path track; and S500, executing motion control through a motion control module according to the path track, converting the track into a control signal of a joint motor of the humanoid robot, and maintaining the balance of the feet in real time.
2. The intelligent navigation method of a humanoid robot according to claim 1, wherein in step S200, processing the multimodal input instruction includes: If the input is a voice command, converting the voice command into a text command through a voice recognition module, wherein the voice recognition module adopts an end-to-end model based on a cyclic neural network, and outputs the text command as a text sequence; If the input is a text instruction, directly using the text instruction; And then carrying out instruction structuring processing on the text instruction, wherein the instruction structuring processing comprises analyzing action types, target objects and spatial relations in the instruction, and applying a spatial relation fuzzy matching algorithm and an index resolution module, wherein the spatial relation fuzzy matching algorithm uses fuzzy logic to calculate membership of the spatial relation, and the index resolution module analyzes the context based on an attention mechanism.
3. The intelligent navigation method of a humanoid robot according to claim 2, wherein in step S200, the instruction structuring process is implemented using a large language model based on a fransformer architecture, the large language model being trained by a self-built dataset, the model structure comprising an encoder and a decoder, wherein: the encoder consists of a plurality of self-attention sublayers and feedforward neural network sublayers, wherein the calculation formula of the self-attention sublayers is as follows , wherein, In order to query the matrix, In the form of a matrix of keys, In the form of a matrix of values, As the dimension of the key vector, For scaling the dot product; the decoder generates a structured output of instructions, including intent classification tags Entity sequence Sum relation triplet , wherein, The two entities are respectively used for the two-dimensional space, Is the first The identity of the individual entity(s), In order to be a relationship to each other, Is the first An individual entity; the training loss function of the large language model is cross entropy loss , wherein, For the number of categories to be considered, As a real tag it is possible to provide a real tag, To predict probability.
4. A humanoid robot intelligent navigation method according to claim 3, characterized in that in step S300, the three-dimensional semantic map is implemented by fusion of lidar and camera sensors and includes item information in the environment acquired by semantic segmentation neural network.
5. The intelligent navigation method of humanoid robot according to claim 4, wherein in step S300, the construction of the three-dimensional semantic map specifically includes the steps of: Step S310, passing laser radar point cloud data And camera image data Sensor fusion is carried out, and an environment point cloud map is generated by using an extended Kalman filtering algorithm , wherein, Respectively the three-dimensional coordinates of the point cloud, For a two-dimensional image pixel matrix, the prediction and updating steps of the extended Kalman filtering are as follows: The prediction step is that , , wherein, For the a priori state estimation, As a function of the state transition(s), For the state estimation at the previous time instant, In order to control the input of the device, For the a priori covariance to be present, For a state transition jacobian matrix, For the covariance of the last moment in time, Is the process noise covariance; The updating step is that , , , wherein, In order for the kalman gain to be achieved, In order to observe the jacobian matrix, In order to observe the covariance of the noise, For the state estimation at the current moment, In order to observe the value of the value, In order to observe the function of the object, Is a unit matrix; Step S320, processing the point cloud map by applying a semantic segmentation neural network, wherein the semantic segmentation neural network is a U-Net architecture, and a loss function of the semantic segmentation neural network is a weighted sum of cross entropy and Dice loss , wherein, In order for the cross-entropy loss to occur, For the sake of the Dice loss, And As the weight coefficient of the light-emitting diode, As a real tag it is possible to provide a real tag, The probability of a prediction is determined, And In order to predict and truly segment the region, 、、 Representing the number of region pixels; Step S330, mapping the semantic tags to a three-dimensional space to generate a semantic map, and storing the item information as structured data including position coordinates And bounding box , wherein, Three-dimensional coordinates in the world coordinate system, Coordinates for each vertex of the bounding box.
6. The intelligent navigation method of a humanoid robot of claim 5, wherein in step S400, the path planning module generates a path trajectory taking into consideration kinematic constraints and semantic map constraints of the humanoid robot.
7. The intelligent navigation method of a humanoid robot according to claim 6, wherein in step S400, the path planning adopts a hierarchical planning architecture, including a global planning layer and a local planning layer, wherein: the global planning layer uses an A-algorithm, and the cost function is that , wherein, From the start point to the node Calculated as the sum of Euclidean distances , For heuristic function, manhattan distance is used , wherein, As the coordinates of the points of the path, For the euclidean distance between a path point and the next path point, Is a node The coordinates of the two points of the coordinate system, The coordinates of the target point; The local planning layer uses a dynamic window algorithm, and the process comprises the following steps: Generating a velocity space , wherein, In order to be a line speed, Is the angular velocity; simulating a plurality of tracks, and evaluating a function as , wherein, For the alignment of the track direction with the target, For the distance of the trajectory from the nearest obstacle, In order to be able to achieve a speed of the type, 、、 Is a weight coefficient.
8. The intelligent navigation method of a humanoid robot of claim 7, wherein in step S400, the path planning module further includes a dynamic re-planning triggering mechanism that triggers re-planning when the following conditions are detected: the first condition is that a new obstacle enters a safe distance, and the safe distance threshold value is set as The detection formula is , wherein, For the current position coordinates of the robot, As the position coordinates of the obstacle, The Euclidean distance between the current position coordinates of the robot and the position coordinates of the obstacle; Condition two, track tracking deviation, deviation threshold Duration of time The detection formula is , wherein, As a result of the actual position of the device, In order to plan the location of the object, For the euclidean distance between the actual position and the planned position, Is the start time; condition three gait zero moment point stability margin is below a threshold The zero moment point calculation formula is , wherein, Is the zero moment point coordinate and is used for the control of the motor, Is the mass of the connecting rod, The force of gravity is applied to the acceleration vector, Is the acceleration of the connecting rod, Is a vector of the position of the connecting rod, Is the number of the connecting rods, Representing the cross product, the stability margin is the minimum distance between the zero moment point and the supporting polygon boundary, the threshold According to the robot design setting.
9. The intelligent navigation method of a humanoid robot of claim 8, wherein in step S500, the execution process of the motion control module includes: Step S510, performing task analysis on the path track to generate a gait sequence; Step S520, generating joint motor control signals according to robot state information by adopting a motion control strategy network based on reinforcement learning, wherein the state information comprises the state of a robot body And environmental status ; Wherein the body state Including centroid position Mass center speed Angle of each joint Angular velocity of each joint Zero moment point coordinates , wherein, Is a joint index; Environmental status Comprising features of topography under foot extracted from three-dimensional semantic map Obstacle information on predicted path ; Step S530, training the reinforcement learning strategy network in training stage by using simulator or physical robot with the aim of maximizing cumulative rewards and rewarding functions Comprising the following steps: Tracking rewards Encouraging the minimum error between the actual track and the planned track of the robot, wherein the calculation formula is as follows , wherein, As a result of the actual position of the device, In order to achieve the desired position, the position of the device, For the euclidean distance between the actual position and the desired position, Is a weight coefficient; translational rewards Punishment zero moment point deviates from the supporting polygon safety area, and the calculation formula is that , wherein, In order to support the security points within the polygon, Is a weight coefficient; Energy rewards Encouraging low energy consumption, smooth movement, calculated as , wherein, Is a joint Is used to control the torque of the motor, As the total number of joints, Is a weight coefficient; Survival rewards Further awarding forward rewards before each success To give a great negative prize when falling down And terminate the training of this round; step S540, in the deployment stage, the trained strategy network parameters are solidified and used for real-time control.

Description

Intelligent navigation method for humanoid robot Technical Field The invention relates to the technical field of navigation, in particular to an intelligent navigation method of a humanoid robot. Background In recent years, the application requirements of humanoid robots in service, medical treatment, home and other scenes are increasing, and the autonomous navigation capability of the humanoid robots becomes a key technical bottleneck. The existing robot navigation scheme is mainly based on the traditional SLAM technology and a predefined instruction set, and realizes environment sensing and path planning through a laser radar, a vision sensor and the like. However, these schemes have significant limitations in terms of natural interactions, semantic understanding, and motion adaptability. For example, the mainstream system adopts keyword recognition or graphic interface operation, lacks deep analysis capability on complex natural language instructions, the environment modeling generally only comprises geometric information and fails to fuse understanding of semantic hierarchy, and meanwhile, the cooperative efficiency of bipedal motion constraint and navigation planning of the humanoid robot is insufficient, so that the execution stability in a dynamic environment is poor. Disclosure of Invention The invention provides an intelligent navigation method of a humanoid robot, which aims to solve the problem that the navigation of the humanoid robot in the prior art has obvious limitations in the aspects of natural interaction, semantic understanding and motion adaptability. The technical scheme adopted by the invention is as follows: An intelligent navigation method of a humanoid robot comprises the following steps: Step S100, receiving a multi-mode input instruction of a user, wherein the multi-mode input instruction comprises a voice instruction or a text instruction; step 200, processing the multi-mode input instruction to generate a structured navigation instruction; step S300, acquiring environment information through an environment sensing module, and constructing a three-dimensional semantic map; Step S400, planning a motion path of the humanoid robot through a path planning module based on the structured navigation instruction and the three-dimensional semantic map to generate a path track; and S500, executing motion control through a motion control module according to the path track, converting the track into a control signal of a joint motor of the humanoid robot, and maintaining the balance of the feet in real time. Preferably, in step S200, processing the multimodal input instruction includes: If the input is a voice command, converting the voice command into a text command through a voice recognition module, wherein the voice recognition module adopts an end-to-end model based on a cyclic neural network, and outputs the text command as a text sequence; If the input is a text instruction, directly using the text instruction; And then carrying out instruction structuring processing on the text instruction, wherein the instruction structuring processing comprises analyzing action types, target objects and spatial relations in the instruction, and applying a spatial relation fuzzy matching algorithm and an index resolution module, wherein the spatial relation fuzzy matching algorithm uses fuzzy logic to calculate membership of the spatial relation, and the index resolution module analyzes the context based on an attention mechanism. Preferably, in step S200, the instruction structuring process is implemented using a large language model based on a fransformer architecture, the large language model being trained by a self-built dataset, the model structure comprising an encoder and a decoder, wherein: the encoder consists of a plurality of self-attention sublayers and feedforward neural network sublayers, wherein the calculation formula of the self-attention sublayers is as follows , wherein,In order to query the matrix,In the form of a matrix of keys,In the form of a matrix of values,As the dimension of the key vector,For scaling the dot product; the decoder generates a structured output of instructions, including intent classification tags Entity sequenceSum relation triplet, wherein,The two entities are respectively used for the two-dimensional space,Is the firstThe identity of the individual entity(s),In order to be a relationship to each other,Is the firstAn individual entity; the training loss function of the large language model is cross entropy loss , wherein,For the number of categories to be considered,As a real tag it is possible to provide a real tag,To predict probability. Preferably, in step S300, the three-dimensional semantic map is implemented by fusion of a laser radar and a camera sensor, and includes item information in the environment acquired by semantic segmentation neural network. Preferably, in step S300, the construction of the three-dimensional semantic map specifically incl