CN-120697023-B - Model reasoning-based humanoid robot control method and related equipment
Abstract
The application discloses a model reasoning-based humanoid robot control method and related equipment, the method comprises the steps of dynamically collecting multi-mode data through a target robot end, transforming the multi-mode data to generate multi-mode transformation data, dynamically monitoring the network communication state of the target robot end, reasoning the multi-mode transformation data through a lightweight reasoning service model according to the network communication state and the data sensitivity of the multi-mode transformation data, and outputting a reasoning result, wherein the lightweight reasoning service model is obtained through full-layer parameter transformation and federal learning collaborative training, and inversely transforming the reasoning result through the target robot end to obtain an action control instruction, and controlling the target humanoid robot according to the action control instruction. The embodiment of the application can realize the robot automatic control based on multi-mode data input and improve the control efficiency and accuracy of the robot. The application can be widely applied to the technical field of robot control.
Inventors
- LI WEICHONG
- LI WEISHEN
Assignees
- 广州里工实业有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20250716
Claims (8)
- 1. A model reasoning-based humanoid robot control method, characterized in that the method comprises the following steps: Dynamically acquiring multi-mode data through a target robot end, and transforming the multi-mode data to generate multi-mode transformation data; Dynamically monitoring the network communication state of the target robot, and reasoning the multi-modal transformation data through a lightweight reasoning service model according to the network communication state and the data sensitivity of the multi-modal transformation data to output a reasoning result; the reasoning result is inversely transformed through the target robot end to obtain an action control instruction, and the target humanoid robot is controlled according to the action control instruction; The dynamic monitoring of the network communication state of the target robot terminal, the reasoning of the multi-modal transformation data through a lightweight reasoning service model according to the network communication state and the data sensitivity of the multi-modal transformation data, and the output of the reasoning result comprise: When the network communication state is a networking state, carrying out collaborative reasoning analysis on the multi-modal transformation data according to the data sensitivity of the multi-modal transformation data through a public lightweight reasoning service model and a local reasoning service model of the target robot end to generate a reasoning result; when the network communication state is an off-network state, the multi-mode transformation data is inferred through a local inference service model of the target robot end, and the inference result is generated; When the network communication state is a networking state, performing collaborative inference analysis on the multi-modal transformation data according to the data sensitivity of the multi-modal transformation data through the public lightweight inference service model and the local inference service model of the target robot end, and generating the inference result, wherein the collaborative inference analysis comprises the following steps: Performing data sensitivity detection on the multi-modal transformation data, and splitting the multi-modal transformation data into sensitive data and non-sensitive data; Reasoning the sensitive data through a local reasoning service model of the target robot end to generate a first reasoning result; Encrypting and uploading the insensitive data to a public cloud node through the target robot end, decrypting the encrypted insensitive data through the public cloud node, and reasoning the decrypted insensitive data by utilizing the public lightweight reasoning service model to generate a second reasoning result; encrypting and issuing the second reasoning result to the target robot end through the public cloud node, decrypting the encrypted second reasoning result through the target robot end, and obtaining a decrypted second reasoning result; And determining the reasoning result according to the first reasoning result and the decrypted second reasoning result.
- 2. The method according to claim 1, wherein the method further comprises: After full-layer parameter transformation is carried out on the public lightweight inference service model through public cloud nodes, the public lightweight inference service model and a replacement matrix are issued to each robot end; Receiving a public task processing model through each robot end, locally storing the public task processing model as a corresponding local reasoning service model, and carrying out model training on the local reasoning service model by utilizing a local data set to generate corresponding model updating parameters; The corresponding model updating parameters are subjected to replacement alignment through each robot end, the model updating parameters are subjected to replacement alignment according to the replacement matrix, the model updating parameters subjected to replacement alignment are encrypted and uploaded to the public cloud nodes, global model aggregation is carried out through the public cloud nodes according to the model updating parameters encrypted and uploaded by each private cloud node, an aggregation result is determined, and the public lightweight reasoning service model is updated according to the aggregation result; And encrypting the updated public lightweight inference service model through the public cloud node, then issuing the encrypted public lightweight inference service model to each robot end, and updating the corresponding local inference service model through each robot end based on the decrypted public lightweight inference service model.
- 3. The method of claim 2, wherein dynamically collecting multi-modal data by the target robot end, transforming the multi-modal data, generating multi-modal transformed data, comprises: Dynamically acquiring the multi-modal data by using a multi-modal sensor through the target robot end, wherein the multi-modal data at least comprises visual image data, tactile signal data and joint sequence data; and performing matrix transformation on the multi-modal data according to the replacement matrix by the target robot end to generate the multi-modal transformation data.
- 4. The method according to claim 2, wherein after performing full-level parameter transformation on the public lightweight inference service model by the public cloud node, issuing the public lightweight inference service model and the replacement matrix to each robot end includes: obtaining a model parameter set corresponding to the public lightweight inference service model, wherein the model parameter set at least comprises a multi-head attention layer parameter, a forward layer parameter and a layer standardization layer parameter; randomly generating the replacement matrix, wherein the dimension of the replacement matrix is consistent with the model hidden layer dimension of the public lightweight inference service model; Carrying out linkage parameter transformation on the model parameter set by utilizing the replacement matrix to obtain a public lightweight reasoning service model after full-layer parameter transformation; And transmitting the public lightweight inference service model and the replacement matrix after the full-layer parameter transformation to each robot terminal through the public cloud node.
- 5. The method according to claim 2, wherein the inverse transforming the reasoning result by the target robot end to obtain an action control instruction, and controlling the target humanoid robot according to the action control instruction comprises: performing inverse transformation on the reasoning result through the replacement matrix locally stored by the target robot end to obtain the action control instruction; and driving the target humanoid robot to execute corresponding actions according to the action control instruction through the target robot end.
- 6. A humanoid robot control device based on model reasoning, the device comprising: the first module is used for dynamically collecting multi-mode data through the target robot end, transforming the multi-mode data and generating multi-mode transformation data; The second module is used for dynamically monitoring the network communication state of the target robot terminal, and reasoning the multi-modal transformation data through a lightweight reasoning service model according to the network communication state and the data sensitivity of the multi-modal transformation data and outputting a reasoning result; The third module is used for carrying out inverse transformation on the reasoning result through the target robot end to obtain an action control instruction, and controlling the target humanoid robot according to the action control instruction; The dynamic monitoring of the network communication state of the target robot terminal, the reasoning of the multi-modal transformation data through a lightweight reasoning service model according to the network communication state and the data sensitivity of the multi-modal transformation data, and the output of the reasoning result comprise: When the network communication state is a networking state, carrying out collaborative reasoning analysis on the multi-modal transformation data according to the data sensitivity of the multi-modal transformation data through a public lightweight reasoning service model and a local reasoning service model of the target robot end to generate a reasoning result; when the network communication state is an off-network state, the multi-mode transformation data is inferred through a local inference service model of the target robot end, and the inference result is generated; When the network communication state is a networking state, performing collaborative inference analysis on the multi-modal transformation data according to the data sensitivity of the multi-modal transformation data through the public lightweight inference service model and the local inference service model of the target robot end, and generating the inference result, wherein the collaborative inference analysis comprises the following steps: Performing data sensitivity detection on the multi-modal transformation data, and splitting the multi-modal transformation data into sensitive data and non-sensitive data; Reasoning the sensitive data through a local reasoning service model of the target robot end to generate a first reasoning result; Encrypting and uploading the insensitive data to a public cloud node through the target robot end, decrypting the encrypted insensitive data through the public cloud node, and reasoning the decrypted insensitive data by utilizing the public lightweight reasoning service model to generate a second reasoning result; encrypting and issuing the second reasoning result to the target robot end through the public cloud node, decrypting the encrypted second reasoning result through the target robot end, and obtaining a decrypted second reasoning result; And determining the reasoning result according to the first reasoning result and the decrypted second reasoning result.
- 7. An electronic device comprising a memory storing a computer program and a processor implementing the method of any one of claims 1 to 5 when the computer program is executed by the processor.
- 8. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 5.
Description
Model reasoning-based humanoid robot control method and related equipment Technical Field The application relates to the technical field of robot control, in particular to a humanoid robot control method and related equipment based on model reasoning. Background Currently, the target robot is increasingly widely applied in the fields of industrial automation, home service, medical care and the like, but the technical implementation faces the dual challenges of real-time performance and data security, and in the related technology, the existing solution has obvious limitations: 1) The motion control of the target robot requires extremely high timeliness, and a typical motion control period is required to be kept within 10-20ms, however, the conventional AI model deployment mode based on cloud reasoning has the following problems: The network delay is uncontrollable, namely even if the theoretical delay of a 5G network is as low as 1ms, the delay in the actual industrial environment can exceed 50ms due to signal interference and bandwidth fluctuation, and the action delay of the robot is caused to be even out of control; The network interruption can cause the stagnation of a robot system in key scenes such as an industrial production line and the like, so that production accidents or economic losses are caused, and the network interruption can cause the stagnation of the robot system in key scenes such as the industrial production line and the like, so that the production accidents or the economic losses are caused; 2) In the traditional scheme, a lightweight model (such as MobileNet) is difficult to handle complex multi-mode fusion tasks of a robot, and the risk of model stealing attacks (Model Stealing) still exists. In summary, the technical problems in the related art are to be improved. Disclosure of Invention The embodiment of the application mainly aims to provide a humanoid robot control method and related equipment based on model reasoning. In order to achieve the above object, an aspect of the embodiments of the present application provides a humanoid robot control method based on model reasoning, the method including the steps of: Dynamically acquiring multi-mode data through a target robot end, and transforming the multi-mode data to generate multi-mode transformation data; Dynamically monitoring the network communication state of the target robot, and reasoning the multi-modal transformation data through a lightweight reasoning service model according to the network communication state and the data sensitivity of the multi-modal transformation data to output a reasoning result; and carrying out inverse transformation on the reasoning result through the target robot end to obtain an action control instruction, and controlling the target humanoid robot according to the action control instruction. In some embodiments, the method further comprises: After full-layer parameter transformation is carried out on the public lightweight inference service model through public cloud nodes, the public lightweight inference service model and a replacement matrix are issued to each robot end; receiving the public task processing model through each robot end, locally storing the public task processing model as a corresponding local reasoning service model, and carrying out model training on the local reasoning service model by utilizing a local data set to generate corresponding model updating parameters; The corresponding model updating parameters are subjected to replacement alignment through each robot end, the model updating parameters are subjected to replacement alignment according to the replacement matrix, the model updating parameters subjected to replacement alignment are encrypted and uploaded to the public cloud node, global model aggregation is carried out through the public cloud node according to the model updating parameters encrypted and uploaded by each private cloud node, an aggregation result is determined, and the public lightweight reasoning service model is updated according to the aggregation result; And encrypting the updated public lightweight inference service model through the public cloud node, then issuing the encrypted public lightweight inference service model to each robot end, and updating the corresponding local inference service model through each robot end based on the decrypted public lightweight inference service model. In some embodiments, the dynamically collecting multi-modal data by the target robot end, transforming the multi-modal data, generating multi-modal transformed data, includes: Dynamically acquiring the multi-modal data by using a multi-modal sensor through the target robot end, wherein the multi-modal data at least comprises visual image data, tactile signal data and joint sequence data; and performing matrix transformation on the multi-modal data according to the replacement matrix by the target robot end to generate the multi-modal transformation data. In s