CN-121998087-A - Knowledge graph-based method for understanding and reasoning body robot

CN121998087ACN 121998087 ACN121998087 ACN 121998087ACN-121998087-A

Abstract

The invention discloses a knowledge graph-based self-contained robot understanding and reasoning method, which comprises the following steps of S1, constructing a multi-level cross-scene knowledge graph based on robot operation knowledge, S2, obtaining a user instruction and generating a corresponding structured context, S3, inputting the user instruction and the structured context into a large language model, and combining the knowledge graph to obtain an executable action sequence.

Inventors

LI GANG
GONG CHANGHAO
JIANG SHUO
HE BIN

Assignees

同济大学

Dates

Publication Date: 20260508
Application Date: 20260121

Claims (8)

1. A knowledge graph-based method for understanding and reasoning a robot with a body comprises the following steps: S1, constructing a multi-level cross-scene knowledge graph based on robot operation knowledge; s2, acquiring a user instruction and generating a corresponding structured context; s3, inputting the user instruction and the structured context into a large language model, and combining the knowledge graph to obtain an executable action sequence.
2. The method for understanding and reasoning the body robot based on the knowledge graph of claim 1, wherein the hierarchy of the knowledge graph comprises a scene layer, a job layer, a task layer, an action primitive layer and a tool layer.
3. The knowledge-based self-contained robotic understanding and reasoning method as claimed in claim 2, wherein the relationships in the knowledge graph include hierarchical and reasoning relationships; the hierarchical relationship is used for describing the relationship among different layers, and the reasoning relationship is used for describing the logic association relationship among different elements.
4. A knowledge-based self-contained robotic understanding and reasoning method as claimed in claim 3, wherein the reasoning relationship comprises: the causal time sequence relationship is used for defining the preconditions among different tasks; an intention behavior relation for constructing an inference path from intention to task; state constraint relationships, defining the environmental or tool constraints required to execute a certain action primitive.
5. A knowledge-based self-contained robotic understanding and reasoning method as claimed in claim 2, 3 or 4, wherein the knowledge-graph construction step comprises: Acquiring a multimodal dataset comprising a human operational demonstration; Identifying scenes, jobs, services, action primitives and tools as entities according to the multi-modal dataset; and confirming the entity attribute and the relationship through the data annotation.
6. A knowledge-based personal robot understanding and reasoning method as claimed in claim 5, wherein the data annotation comprises action segmentation, gesture segmentation and/or semantic segmentation.
7. The knowledge-based on-body robot understanding and reasoning method as set forth in claim 5, wherein the S2 comprises: searching and inquiring the user instruction through the preset keywords to obtain corresponding entities and relations; And constructing the structured context information according to the query result.
8. The knowledge-based on-body robot understanding and reasoning method as set forth in claim 7, wherein the step S3 includes: preferentially searching a solution directly corresponding to the task instruction in the knowledge graph; when no direct information is found in the knowledge graph, the structure and related information of the knowledge graph are used as the context, and the complex tasks which are not clearly defined are subjected to supplementary reasoning by utilizing the common knowledge and reasoning capability of the large language model, so that a feasible action sequence is generated.

Description

Knowledge graph-based method for understanding and reasoning body robot Technical Field The invention relates to the technical field of artificial intelligence, in particular to a knowledge graph-based method for understanding and reasoning of a robot with a body. Background In recent years, with the development of artificial intelligence technology, personal intelligence has become a research hotspot. It aims to understand and accomplish complex tasks by placing agents in a physical environment, making them understand, interact and learn. In order to enable a robot to effectively work in various scenes such as assembly, housekeeping, rescue and the like, how to enable the robot to understand task instructions and autonomously plan action sequences is a core challenge at present. Currently, task execution methods of an autonomous robot mainly depend on a teaching programming or a deep learning model. Although the traditional teaching programming method is accurate in specific task, the generalization capability is extremely poor, and a large amount of manual reprogramming is needed whenever a new task or a new environment is faced, so that the method cannot adapt to the dynamic change of an open environment. While end-to-end deep learning methods, such as imitative learning and reinforcement learning, while possessing some generalization capability, often require massive amounts of training data and their decision process lacks interpretability, like a "black box", which is a fatal defect in industrial applications requiring high safety and high reliability. To enhance the reasoning and generalization capabilities of robots, researchers have begun to attempt to introduce Large Language Models (LLMs) to assist in mission planning. The powerful natural language understanding and common sense reasoning capabilities of large language models enable them to break down high-level user instructions into a series of subtasks. However, the simple dependence on the large language model has significant disadvantages that firstly, the knowledge of the large language model is derived from the general corpus, the structured and specialized knowledge of the specific operation field is lacking, the generated action sequence possibly does not conform to the physical constraint or the operation specification, and secondly, the reasoning process has uncertainty, and sometimes "illusion" is generated, so that the reliability of the planning result is insufficient. In order to address the above drawbacks, researchers have proposed combining search enhanced generation (RAG) techniques with large language models to provide context knowledge for the model by searching relevant documents. However, the knowledge provided in this manner is a sporadic piece of text, failing to take full advantage of the hierarchical and associative structure inherent in operational knowledge. Therefore, how to fully utilize the operational knowledge and improve the reasoning ability of large models is a problem that needs to be solved by those skilled in the art. Disclosure of Invention The present invention has been made in view of the above problems, and it is an object of the present invention to provide an autonomous robot understanding and reasoning method based on a knowledge graph that overcomes or at least partially solves the above problems. In order to achieve the above purpose, the present invention adopts the following technical scheme: a knowledge graph-based method for understanding and reasoning a robot with a body comprises the following steps: S1, constructing a multi-level cross-scene knowledge graph based on robot operation knowledge; s2, acquiring a user instruction and generating a corresponding structured context; s3, inputting the user instruction and the structured context into a large language model, and combining the knowledge graph to obtain an executable action sequence. Preferably, the hierarchical structure of the knowledge graph comprises a scene layer, a job layer, a task layer, an action primitive layer and a tool layer. Preferably, the relationship in the knowledge graph comprises a hierarchical relationship and an inference relationship; the hierarchical relationship is used for describing the relationship among different layers, and the reasoning relationship is used for describing the logic association relationship among different elements. Preferably, the inference relation includes: the causal time sequence relationship is used for defining the preconditions among different tasks; an intention behavior relation for constructing an inference path from intention to task; state constraint relationships, defining the environmental or tool constraints required to execute a certain action primitive. Preferably, the step of constructing the knowledge graph includes: Acquiring a multimodal dataset comprising a human operational demonstration; Identifying scenes, jobs, services, action primitives and tools as entities acco