CN-122021710-A - Bidirectional human-computer behavior analysis method and system based on meta-learning social heterogeneous multi-robot system

CN122021710ACN 122021710 ACN122021710 ACN 122021710ACN-122021710-A

Abstract

The invention discloses a bidirectional human-computer behavior analysis method and system based on a meta-learning social heterogeneous multi-robot system, which comprise the steps of acquiring environment state data, robot internal state data and human-computer interaction data, carrying out intention analysis by utilizing a double-layer grid observation model, a state code and a large language model based on the acquired data to generate characteristic representation suitable for multi-agent reinforcement learning and meta-learning, and learning a path planning strategy, an energy management strategy, an obstacle avoidance strategy and a bidirectional interaction strategy under high-density dynamic obstacle and complex crowd interaction scenes by combining a three-stage course learning method, an intrinsic curiosity reward and a MAML-based meta-learning framework based on the characteristic representation, and carrying out human-computer interaction under various dynamic environments and responding to corresponding action instructions based on the path planning strategy, the energy management strategy, the obstacle avoidance strategy and the bidirectional interaction strategy. The invention improves the environmental adaptability, the resource scheduling efficiency and the man-machine cooperation efficiency of the multi-robot system.

Inventors

GAO YUAN
WANG LIN
WANG HAOCHENG
LIN TIANLIN

Assignees

深圳市人工智能与机器人研究院

Dates

Publication Date: 20260512
Application Date: 20251216

Claims (10)

1. A bidirectional human-computer behavior analysis method based on a meta-learning social heterogeneous multi-robot system is characterized by comprising the following steps: acquiring environment state data, robot internal state data and man-machine interaction data; Based on the acquired data, performing intention analysis by using a double-layer grid observation model, a state code and a large language model to generate a characteristic representation suitable for multi-agent reinforcement learning and meta learning; Based on the characteristic representation, a three-stage course learning method, an inherent curiosity reward and a MAML-based meta learning framework are combined to learn a path planning strategy, an energy management strategy, an obstacle avoidance strategy and a bidirectional interaction strategy under high-density dynamic obstacle and complex crowd interaction scenes; And based on the path planning strategy, the energy management strategy, the obstacle avoidance strategy and the bidirectional interaction strategy, performing man-machine interaction in various dynamic environments and responding to corresponding action instructions.
2. The method for bi-directional human-machine behavior analysis based on a meta-learning social heterogeneous multi-robot system according to claim 1, wherein the acquiring environmental state data, robot internal state data, and human-machine interaction data comprises: acquiring local occupation states, dynamic obstacle positions and moving directions around the robot through a near-field grid, and acquiring space distribution of a global covered state and an uncovered area through a far-field grid to obtain environmental state data; acquiring the position and the posture of a robot, joint data, motion parameters, I/O signals and system states, and obtaining the internal state data of the robot; And acquiring the pedestrian track, the interaction response state, the walking speed and the approaching/avoiding behavior to obtain the man-machine interaction data.
3. The method for analyzing bidirectional human-computer behavior based on a meta-learning social heterogeneous multi-robot system according to claim 1, wherein the generating a feature representation suitable for multi-agent reinforcement learning and meta-learning based on the acquired data by performing intent analysis by using a double-layer grid observation model, a state code and a large language model comprises: inputting the environmental state data into the double-layer grid observation model for analysis to generate coverage difference features; Inputting the internal state data of the robot and the man-machine interaction data into the state code and the large language model for intention analysis, and generating dynamic barrier features and social features; And taking the coverage difference characteristic, the dynamic obstacle characteristic and the social characteristic as characteristic representations of multi-agent reinforcement learning and meta learning.
4. The method for bi-directional human-machine behavior analysis based on meta-learning social heterogeneous multi-robot system according to claim 3, wherein inputting the robot internal state data and the human-machine interaction data into the state code and the large language model for intent analysis generates dynamic obstacle features and social features, comprising: inputting the internal state data of the robot into the state code and carrying out intention analysis on the state code and the large language model to generate relative speed and relative direction, thereby obtaining the dynamic obstacle characteristics; and inputting the man-machine interaction data into the state code and carrying out intention analysis on the state code and the large language model to generate interaction probability, guiding success rate and emotion related indexes, and obtaining the social characteristics.
5. The method for bi-directional human-machine behavior analysis based on a meta-learning social heterogeneous multi-robot system according to claim 1, wherein the learning of path planning strategies, energy management strategies, obstacle avoidance strategies and bi-directional interaction strategies in high-density dynamic obstacle and complex crowd interaction scenarios based on the feature representation in combination with a three-stage course learning method, an intrinsic curiosity reward and a MAML-based meta-learning framework comprises: Inputting the feature representation into the MAML-based meta-learning framework; And taking the intrinsic curiosity rewards as a bottom rewarding function, and adopting a three-stage course learning method to perform cross-group collaborative training, intra-group collaborative training and social interaction training, and learning the path planning strategy, the energy management strategy, the obstacle avoidance strategy and the bidirectional interaction strategy.
6. The bi-directional human-machine behavior analysis method based on the meta-learning social heterogeneous multi-robot system according to claim 5, wherein the performing of the group-crossing collaborative training, the group-internal collaborative training and the social interaction training by using the intrinsic curiosity rewards as a bottom rewards function and adopting a three-stage course learning method comprises: Based on the characteristic representation, performing group-crossing collaborative training by utilizing a decision network, optimizing a resource supply strategy, and learning to obtain the path planning strategy and the energy management strategy; Based on the characteristic representation, performing intra-group coordination training by utilizing a multi-agent reinforcement learning algorithm, and performing division of work and collision avoidance cooperation in a learning area to obtain the obstacle avoidance strategy; Based on the characteristic representation, social interaction training is carried out by utilizing the intrinsic curiosity rewards, and an active interaction strategy/passive interaction strategy is learned to obtain the bidirectional interaction strategy.
7. The method for analyzing bidirectional human-computer behavior based on a meta-learning social heterogeneous multi-robot system according to claim 1, wherein the performing human-computer interaction and responding to corresponding action instructions in a plurality of dynamic environments based on the path planning strategy, the energy management strategy, the obstacle avoidance strategy and the bidirectional interaction strategy comprises: based on the path planning strategy, the energy management strategy, the obstacle avoidance strategy and the bidirectional interaction strategy, the robot chassis, the charging mechanism and the social interaction module are driven to perform man-machine interaction in various dynamic environments, and corresponding action instructions are responded.
8. A bi-directional human-machine behavior analysis system based on a meta-learning social heterogeneous multi-robot system, comprising: the data acquisition module is used for acquiring environment state data, robot internal state data and man-machine interaction data; The feature extraction module is used for carrying out intention analysis by utilizing a double-layer grid observation model, a state code and a large language model based on the acquired data to generate a feature representation suitable for multi-agent reinforcement learning and meta learning; The meta learning module is used for learning a path planning strategy, an energy management strategy, an obstacle avoidance strategy and a bidirectional interaction strategy under the interaction scene of high-density dynamic obstacle and complex crowd based on the characteristic representation in combination with a three-stage course learning method, an internal curiosity reward and a meta learning framework based on MAML; And the bidirectional interaction module is used for carrying out man-machine interaction under various dynamic environments and responding to corresponding action instructions based on the path planning strategy, the energy management strategy, the obstacle avoidance strategy and the bidirectional interaction strategy.
9. A terminal comprising a processor and a memory, the memory storing a bi-directional human-machine behavior analysis program based on a meta-learning social heterogeneous multi-robot system, the bi-directional human-machine behavior analysis program based on the meta-learning social heterogeneous multi-robot system when executed by the processor for implementing the operations of the bi-directional human-machine behavior analysis method based on the meta-learning social heterogeneous multi-robot system of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a bi-directional man-machine behavior analysis program based on a meta-learning social heterogeneous multi-robot system, which when executed by a processor is for implementing the operations of the bi-directional man-machine behavior analysis method based on a meta-learning social heterogeneous multi-robot system according to any one of claims 1 to 7.

Description

Bidirectional human-computer behavior analysis method and system based on meta-learning social heterogeneous multi-robot system Technical Field The invention relates to the technical field of multi-robot systems, in particular to a bidirectional human-computer behavior analysis method and system based on a meta-learning social heterogeneous multi-robot system. Background The socially cooperative adaptability of heterogeneous multi-robot systems (HMRS) is a central research goal. Currently HMRS is widely applied to the fields of building construction, automatic driving, network resource allocation and the like, however, the emerging scientific progress shows that the existing system has significant limitations in coping with complex reality scenes. The method is characterized by three defects: 1) The environment adaptability is poor, the challenge of '3A environment' cannot be synchronously processed, and any physical obstacle (irregular static object), any track dynamic obstacle (pedestrian/vehicle moving randomly) and any game intention (human cooperation/competition/neutral behavior) can be synchronously processed; 2) The resource scheduling is low-efficiency, namely a fixed replenishment station causes a worker robot to interrupt tasks frequently, and the resource replenishment is time-consuming; 3) Human-machine synergetic lack of real-time analysis capability of human intention, and incapability of dynamically adjusting strategies according to scene characteristics (such as airport desert crowd vs market interaction crowd). Accordingly, there is a need in the art for improvement. Disclosure of Invention The invention aims to solve the technical problems of the prior art, and provides a bidirectional human-computer behavior analysis method and system based on a meta-learning social heterogeneous multi-robot system, which are used for solving the problems of poor environmental adaptability, low resource scheduling efficiency and lack of human-computer coordination in the prior art. The technical scheme adopted for solving the technical problems is as follows: in a first aspect, the present invention provides a method for bi-directional human-machine behavior analysis based on a meta-learning social heterogeneous multi-robot system, comprising: acquiring environment state data, robot internal state data and man-machine interaction data; Based on the acquired data, performing intention analysis by using a double-layer grid observation model, a state code and a large language model to generate a characteristic representation suitable for multi-agent reinforcement learning and meta learning; Based on the characteristic representation, a three-stage course learning method, an inherent curiosity reward and a MAML-based meta learning framework are combined to learn a path planning strategy, an energy management strategy, an obstacle avoidance strategy and a bidirectional interaction strategy under high-density dynamic obstacle and complex crowd interaction scenes; And based on the path planning strategy, the energy management strategy, the obstacle avoidance strategy and the bidirectional interaction strategy, performing man-machine interaction in various dynamic environments and responding to corresponding action instructions. In one implementation, the acquiring the environmental state data, the robot internal state data, and the man-machine interaction data includes: acquiring local occupation states, dynamic obstacle positions and moving directions around the robot through a near-field grid, and acquiring space distribution of a global covered state and an uncovered area through a far-field grid to obtain environmental state data; acquiring the position and the posture of a robot, joint data, motion parameters, I/O signals and system states, and obtaining the internal state data of the robot; And acquiring the pedestrian track, the interaction response state, the walking speed and the approaching/avoiding behavior to obtain the man-machine interaction data. In one implementation, the generating a feature representation suitable for multi-agent reinforcement learning and meta-learning based on the acquired data using a double-layer grid observation model, a state code, and a large language model includes: inputting the environmental state data into the double-layer grid observation model for analysis to generate coverage difference features; Inputting the internal state data of the robot and the man-machine interaction data into the state code and the large language model for intention analysis, and generating dynamic barrier features and social features; And taking the coverage difference characteristic, the dynamic obstacle characteristic and the social characteristic as characteristic representations of multi-agent reinforcement learning and meta learning. In one implementation, the inputting the robot internal state data and the man-machine interaction data into the state code and the large language mode