CN-121979066-A - Task contractual distributed robot autonomous system

CN121979066ACN 121979066 ACN121979066 ACN 121979066ACN-121979066-A

Abstract

The invention provides a task contract type distributed robot autonomous system, wherein an upper system issues a task contract comprising targets, constraints and resource limitations to a robot. Based on the contract and local real-time perception, the robot autonomously completes task planning, execution and exception handling, and communicates with the upper layer system only when the contract cannot be achieved. The robot receives an advanced 'task contract' instead of a specific path, and makes a completely autonomous decision locally, so that decoupling of network connection is realized. Through the architecture of 'task contract + PPO local decision', the pain points of the existing distributed robot system, such as network dependence, weak autonomous adaptability, poor decision stability and the like, are solved, the robustness, controllability and reliability of the system are improved, and the system is suitable for scenes of warehouse logistics, environment monitoring, emergency rescue and the like.

Inventors

BAI HUIYUAN
CHEN XI
WANG XIAOTIAN

Assignees

原力无限科技控股(浙江)有限公司

Dates

Publication Date: 20260505
Application Date: 20260204

Claims (10)

1. A task-contractual distributed robotic autonomous system, comprising: An upper system configured to generate and issue a task contract; At least one robot configured to receive the task compact, and A communication network connecting the upper system and the at least one robot; the robot is configured to autonomously generate and execute an action instruction through a locally-operated decision algorithm based on the received task contract and local real-time perception information, and perform abnormal communication with the upper layer system through the communication network only when the task contract cannot be achieved.
2. The system of claim 1, wherein the task compact comprises a task goal, at least one task constraint, and at least one resource constraint.
3. The system of claim 2, wherein the robot comprises a decision module that optimizes a PPO algorithm implementation based on a near-end policy, a return function r_t of the PPO algorithm consisting of a target return, a constraint penalty term, and a resource penalty term, wherein the constraint penalty term and the resource penalty term are generated in accordance with task constraints and resource limitations in the task contract.
4. The system of claim 3, wherein the decision module comprises: a policy network for generating actions based on environmental state inputs; a value network for estimating state values, and An optimization unit configured to update parameters of the policy network by maximizing an objective function L (θ) after clipping, based on the history data stored in the experience playback buffer, the objective function L (θ) being used to limit the magnitude of policy update.
5. The system of claim 1, wherein the robot further comprises: the sensing module is used for acquiring environment and state information of the robot; A contract resolving module for resolving and storing the received task contract, and And the execution control module is used for converting the action instruction into a driving signal to control the movement or operation of the robot.
6. The system of claim 5, wherein the sensing module comprises at least two of a lidar, an RGB-D camera, a GPS module, a power sensor, and a collision sensor for performing environmental mapping, obstacle recognition, positioning, and self-condition monitoring.
7. The system of claim 1, wherein the upper layer system comprises: the contract generation module is used for receiving user input and generating a structured task contract; a contract management module for managing the life cycle of the task contract and adjusting the contract in response to the abnormality, and And the state monitoring module is used for visually monitoring the state of the robot and abnormal events.
8. The system according to claim 1 or 7, wherein the communication network employs a MQTT protocol based on a publish/subscribe mode, the upper system communicates with the robot via a predefined MQTT theme, wherein the task contracts are issued via a first theme, the robot status is reported via a second theme, and the anomaly information is reported via a third theme.
9. The system of claim 1, wherein the robot further comprises an exception handling module configured to monitor robot status in real time based on preset rules, trigger local exception handling actions when an exception condition defined in the task contract is detected to be met, and send exception information to the upper layer system over the communication network.
10. An application interaction method of a task-contractual distributed robotic autonomous system, applied to the system of any of claims 1-9, the method comprising: generating a task contract comprising a task target, task constraint and resource limitation through an upper layer system, and issuing the task contract through a communication network; receiving and resolving the task contract by a robot; acquiring environment and self state information through a sensing module of the robot; the decision module of the robot is used for generating an autonomous action instruction conforming to contract constraints based on a PPO algorithm by combining the analyzed task contract and the acquired state information; Executing the action instruction through an execution control module of the robot; And in the execution process, if the task contract is judged to be unable to be achieved, triggering an abnormal processing flow and notifying the upper layer system through a communication network.

Description

Task contractual distributed robot autonomous system Technical Field The invention relates to the technical field of intelligent robots, in particular to a task contract type distributed robot autonomous system, an application interaction method thereof, electronic equipment and a computer readable storage medium. Background The existing distributed robot system mainly has the following pain points: The traditional centralized system requires that each step of action of the robot is requested to the central server, and network delay or interruption can lead to decision delay and even task failure (such as a network failure stop-and-swing event of the Amazon warehouse robot). Autonomous flexibility is poor, some distributed systems support local decisions, but lack explicit task constraints (e.g. "cannot enter dangerous areas") and resource limitations (e.g. "power less than 20% must return"), robots are prone to off-target or unable to complete tasks due to resource exhaustion. The decision stability is poor, and the traditional reinforcement learning algorithm (such as A3C) is easy to collapse in a continuous action space (such as the moving speed and the steering angle of a robot) due to overlarge strategy updating amplitude, so that the method cannot adapt to a complex environment. The task definition is fuzzy, namely a user needs to manually specify a specific path or action of the robot, and a high-level task target such as 'collecting area A environment data' cannot be intuitively expressed. Disclosure of Invention In order to solve the technical problems in the prior art, the invention provides the following technical scheme: In one aspect, there is provided a task-contractual distributed robotic autonomous system comprising: An upper system configured to generate and issue a task contract; At least one robot configured to receive the task compact, and A communication network connecting the upper system and the at least one robot; the robot is configured to autonomously generate and execute an action instruction through a locally-operated decision algorithm based on the received task contract and local real-time perception information, and perform abnormal communication with the upper layer system through the communication network only when the task contract cannot be achieved. Preferably, the task compact contains a task objective, at least one task constraint, and at least one resource constraint. Preferably, the robot comprises a decision module, the decision module optimizes the implementation of a PPO algorithm based on a near-end strategy, and a return function r_t of the PPO algorithm is composed of a target return, a constraint penalty term and a resource penalty term, wherein the constraint penalty term and the resource penalty term are generated according to task constraints and resource limitations in the task contract. Preferably, the decision module comprises: a policy network for generating actions based on environmental state inputs; a value network for estimating state values, and An optimization unit configured to update parameters of the policy network by maximizing an objective function L (θ) after clipping, based on the history data stored in the experience playback buffer, the objective function L (θ) being used to limit the magnitude of policy update. Preferably, the robot further comprises: the sensing module is used for acquiring environment and state information of the robot; A contract resolving module for resolving and storing the received task contract, and And the execution control module is used for converting the action instruction into a driving signal to control the movement or operation of the robot. Preferably, the sensing module comprises at least two of a laser radar, an RGB-D camera, a GPS module, an electric quantity sensor and a collision sensor, and is used for realizing environment mapping, obstacle recognition, positioning and self state monitoring. Preferably, the upper layer system comprises: the contract generation module is used for receiving user input and generating a structured task contract; a contract management module for managing the life cycle of the task contract and adjusting the contract in response to the abnormality, and And the state monitoring module is used for visually monitoring the state of the robot and abnormal events. Preferably, the communication network adopts an MQTT protocol based on a publish/subscribe mode, and the upper system communicates with the robot through a predefined MQTT theme, where the task contract is issued through a first theme, the robot state is reported through a second theme, and the abnormal information is reported through a third theme. Preferably, the robot further comprises an abnormality processing module, wherein the abnormality processing module is configured to monitor the state of the robot in real time based on preset rules, trigger local abnormality processing actions when the abnormal conditions