CN-121973184-A - Action instruction generation method, computing device and machine-readable storage medium

CN121973184ACN 121973184 ACN121973184 ACN 121973184ACN-121973184-A

Abstract

The application discloses a motion instruction generation method, computing equipment and a machine-readable storage medium, and relates to the technical field of personal intelligence. The method comprises the steps of inquiring a preset scene knowledge base based on an input task instruction to obtain background knowledge corresponding to the task instruction, generating an enhancement instruction based on the background knowledge and the task instruction, and inputting the enhancement instruction and the body state of the controlled equipment to a pre-trained action planning model to obtain an action instruction for driving the controlled equipment to execute the task. By introducing a scene knowledge base and an instruction enhancement mechanism, the precise understanding and reliable planning of the controlled equipment on the fuzzy or complex instructions are realized, and the adaptability and task success rate of the controlled equipment under unfamiliar environments and complex tasks are improved. And a safe, efficient and context-conforming action sequence is generated, enabling the ability of the controlled device to continuously learn and perform tasks in the open world at a lower cost.

Inventors

YIN QIANQIAN
FU LING
ZENG GUANG
SHE LINGJUAN
TONG XING
ZHOU ZHIZHONG

Assignees

中科云谷科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251224

Claims (10)

1. A method of generating an action instruction, comprising: Inquiring a preset scene knowledge base based on an input task instruction to obtain background knowledge corresponding to the task instruction, wherein the background knowledge comprises at least one of environment semantic information, operation skill information and historical task memory; generating an enhancement instruction based on the background knowledge and the task instruction; and inputting the enhancement instruction and the body state of the controlled equipment into a pre-trained action planning model to obtain an action instruction for driving the controlled equipment to execute tasks.
2. The method for generating action instructions according to claim 1, further comprising, before the step of querying a preset scene knowledge base based on the input task instructions: constructing a semantic map of a target physical scene based on observation data of the target physical scene, wherein the semantic map comprises attributes and positions of all environment objects in the target physical scene; respectively acquiring description semantic information of each environment object, wherein the description semantic information is used for defining functions of the environment objects or describing function association information between the environment objects and other environment objects in the same target physical scene; Respectively associating the attribute, the position and the description semantic information of each environment object to generate environment semantic information; and storing the environment semantic information corresponding to all the environment objects into the scene knowledge base.
3. The method for generating action instructions according to claim 2, further comprising, before the step of querying a preset scene knowledge base based on the input task instructions: Acquiring a spatial relationship among the environmental objects in the semantic map; and determining membership relations among the environment objects based on the spatial relations, and storing the membership relations to the scene knowledge base.
4. The action instruction generation method according to claim 3, further comprising: In the case where the controlled device is applied to a plurality of target physical scenes, for each of the target physical scenes: and associating the scene identification of the target physical scene with the semantic map corresponding to the target physical scene and the membership, and storing the semantic map and the membership in the scene knowledge base.
5. The method for generating action instructions according to claim 1, further comprising, before the step of querying a preset scene knowledge base based on the input task instructions: Classifying the historical task execution records based on the operation mode of the controlled equipment under the condition that the historical task execution records of the controlled equipment exist; extracting task characteristics and operation characteristics in a historical task execution record included in each category aiming at each category; generating operation skill information corresponding to the category based on the task feature and the operation feature; and associating the operation skill information with the corresponding task characteristics and storing the operation skill information into the scene knowledge base.
6. The method for generating action instructions according to claim 1, further comprising, before the step of querying a preset scene knowledge base based on the input task instructions: Screening a target task execution record from the historical task execution record based on a preset storage window under the condition that the historical task execution record of the controlled device exists; extracting task semantic features in the target task execution records aiming at each target task execution record, wherein the task semantic features comprise at least one of task instructions, environment objects and environment semantic information; and associating the task semantic features with corresponding target task execution records, and storing the task semantic features as historical task memories into the scene knowledge base.
7. The method for generating an action instruction according to claim 1, wherein the querying a preset scene knowledge base based on the input task instruction to obtain background knowledge corresponding to the task instruction includes: inquiring a preset scene database based on an input task instruction, and determining a first preset number of inquiry results; According to the semantic relativity between each query result and the task instruction, sequencing the query results from high to low; And selecting a second preset number of query results with top ranking from the ranking results as background knowledge corresponding to the task instruction.
8. The action instruction generation method according to claim 1, wherein the generating an enhancement instruction based on the background knowledge and the task instruction includes: Generating result verification information based on the background knowledge, the task instruction and the observation data of the target physical scene; Inputting the result verification information into a pre-trained result verification model, and screening target background knowledge related to the task instruction from the background knowledge through the result verification model; generating enhancement instructions based on the target background knowledge and the task instructions.
9. A computing device, comprising: a memory configured to store instructions; a processor configured to invoke the instructions from the memory and when executing the instructions is capable of implementing the action instruction generation method according to any of claims 1 to 8.
10. A machine-readable storage medium having stored thereon instructions for causing a machine to perform the action instruction generating method according to any one of claims 1 to 8.

Description

Action instruction generation method, computing device and machine-readable storage medium Technical Field The present application relates to the technical field of personal intelligence, and in particular, to a method for generating an action instruction, a computing device, and a machine-readable storage medium. Background In the fields of self intelligence and robots, the end-to-end visual language action model provides a quite promising path for realizing natural and efficient man-machine interaction. The model aims at directly mapping visual observation and language instructions into robot actions so as to realize high autonomy. However, as application scenes are expanded to open and complex physical environments such as daily home, office and the like, the inherent limitations of the model are increasingly highlighted. On the one hand, the performance of the model is severely dependent on the size and quality of its training data set, and the knowledge of its internal memory is static and limited, resulting in significant degradation or even failure of model planning and execution capabilities in the face of new objects not covered by training data, new environments, or complex long-period tasks requiring deep common sense reasoning on instructions. On the other hand, some technologies attempt to introduce the thinking-back and planning capability through model fine tuning, and face the serious challenges of high data labeling cost, complex training process and insufficient generalization capability, so that the robots are difficult to economically and efficiently adapt to diversified real-world tasks. Therefore, how to break through the limitation of model parameterization knowledge, endow a robot with low cost and dynamic expansion capability, so that the robot can perform reliable task planning and self-adaptive execution in unfamiliar and complex scenes, and the robot becomes a technical problem to be solved. Disclosure of Invention In view of the foregoing deficiencies of the prior art, it is an object of embodiments of the present application to provide a method, computing device and machine-readable storage medium for generating an action instruction. In order to achieve the above object, a first aspect of the present application provides an action instruction generating method, including: inquiring a preset scene knowledge base based on an input task instruction to obtain background knowledge corresponding to the task instruction, wherein the background knowledge comprises at least one of environment semantic information, operation skill information and historical task memory; generating an enhancement instruction based on the background knowledge and the task instruction; and inputting the enhancement instruction and the body state of the controlled equipment into a pre-trained action planning model to obtain an action instruction for driving the controlled equipment to execute the task. In the embodiment of the present application, before the step of querying the preset scene knowledge base based on the input task instruction, the method further includes: based on the observation data of the target physical scene, constructing a semantic map of the target physical scene, wherein the semantic map comprises the attribute and the position of each environment object in the target physical scene; respectively acquiring description semantic information of each environment object, wherein the description semantic information is used for defining functions of the environment objects or describing function association information between the environment objects and other environment objects in the same target physical scene; respectively associating the attribute, the position and the description semantic information of each environment object to generate environment semantic information; and storing the environment semantic information corresponding to all the environment objects into a scene knowledge base. In the embodiment of the present application, before the step of querying the preset scene knowledge base based on the input task instruction, the method further includes: Acquiring a spatial relationship among all environment objects in the semantic map; And determining membership relations among all environment objects based on the spatial relations, and storing the membership relations in a scene knowledge base. In the embodiment of the application, the action instruction generating method further comprises the following steps: In the case where the controlled device is applied to a plurality of target physical scenes, for each target physical scene: And associating the scene identification of the target physical scene with the semantic map and the membership corresponding to the target physical scene, and storing the semantic map and the membership in a scene knowledge base. In the embodiment of the present application, before the step of querying the preset scene knowledge base based on the