CN-122015870-A - Robot navigation method, electronic equipment and readable storage medium

CN122015870ACN 122015870 ACN122015870 ACN 122015870ACN-122015870-A

Abstract

The application discloses a robot navigation method, electronic equipment and a readable storage medium, which comprise the steps of receiving a natural language instruction, analyzing the natural language instruction through a natural language processing model to obtain a structured navigation element, wherein the navigation element comprises a plurality of road mark points, a plurality of landmarks, a relative motion relation between the road mark points and a relative space relation between the road mark points, constructing a navigation factor graph according to the navigation element system, acquiring an environment image of an environment where a robot is located in real time, identifying an object in the environment image, carrying out semantic matching on the identified object and the landmarks in the navigation factor graph, optimizing the navigation factor graph according to a matching result, and controlling the robot navigation based on the optimized navigation factor graph. Therefore, by combining the language instruction with the image observation data, the relation between each object in the environment where the language instruction is located and each landmark in the language instruction can be accurately distinguished, and the task can be executed more accurately and efficiently in a complex scene.

Inventors

WEN WEI
YANG JUN
Xie Changzuo
ZENG GUANG
SHE LINGJUAN
TONG XING

Assignees

中科云谷科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260331

Claims (10)

1. A method of navigating a robot, comprising: Receiving a natural language instruction; Analyzing the natural language instruction through a natural language processing model to obtain a structured navigation element, wherein the navigation element comprises a plurality of landmark points, a plurality of landmarks, a relative motion relationship between the landmark points and a relative spatial relationship between the landmark points and the landmarks; Constructing a navigation factor graph according to the navigation factor system, wherein the navigation factor graph comprises nodes representing the landmark points, nodes representing the landmarks and constraint edges constructed based on the relative motion relationship and the relative spatial relationship; Acquiring an environment image of an environment where the robot is located in real time, identifying an object in the environment image, and carrying out semantic matching on the identified object and a landmark in the navigation factor graph; and optimizing the navigation factor graph according to the matching result, and controlling the robot to navigate based on the optimized navigation factor graph.
2. The method of claim 1, wherein the parsing the natural language instruction by a natural language processing model to obtain a structured navigation element comprises: performing sensitive entity identification and desensitization processing on the natural language instruction locally, and replacing the identified sensitive vocabulary with abstract placeholders to obtain a desensitization instruction; sending the desensitization instruction to a natural language processing model of the cloud for analysis; After the analysis result is obtained, recovering the abstract placeholder in the analysis result to the original semantic according to the mapping relation established during desensitization locally so as to obtain the structured navigation element.
3. The robotic navigation method of claim 2, wherein the desensitizing process comprises: Identifying sensitive vocabulary in the natural language instruction through named entity identification or rule matching; And replacing the identified sensitive vocabulary with a predefined abstract placeholder, and generating a local mapping table for recording the replacement relation.
4. The robotic navigation method of claim 1, wherein semantically matching the identified object with a landmark in the navigation factor graph comprises: Extracting a visual feature vector of an object in the environment image and a text feature vector corresponding to the landmark; Calculating the similarity between the visual feature vector and the text feature vector; and if the similarity exceeds a preset threshold, judging that the matching is successful.
5. The method of claim 4, wherein when the environmental image includes at least two similar objects, the method further comprises: calculating the mahalanobis distance between the at least two similar objects based on the visual feature vectors of the at least two similar objects; And dynamically adjusting the quantity and association relation of the corresponding landmark nodes in the navigation factor graph according to the mahalanobis distance.
6. The method according to claim 5, wherein dynamically adjusting the number and association relation of the corresponding landmark nodes in the navigation factor graph according to the mahalanobis distance comprises: If the mahalanobis distance is larger than or equal to a preset splitting threshold, generating a plurality of candidate landmark nodes for the similar objects, and replacing the original single landmark node with the plurality of candidate landmark nodes; calculating posterior probability of each candidate landmark node; If the joint posterior probability of the combined candidate landmark nodes is higher than a preset probability threshold, combining the candidate landmark nodes, otherwise, reserving the candidate landmark node with the highest confidence coefficient.
7. The method for navigating a robot according to any one of claims 1 to 6, wherein optimizing the navigation factor graph according to the matching result comprises: and solving the optimal pose of all nodes in the navigation factor graph by maximizing posterior probability estimation.
8. The robotic navigation method of claim 1, wherein the method further comprises: Receiving a new natural language instruction and performing desensitization treatment; Analyzing the new natural language instruction through the natural language processing model to obtain incremental information for updating the navigation factor graph, wherein the incremental information indicates old nodes or old edges needing to be deleted and new nodes or new edges needing to be added; and carrying out incremental updating and local optimization on the navigation factor graph according to the incremental information so as to control the robot to re-navigate according to the updated navigation factor graph.
9. An electronic device comprising a processor and a memory storing a computer program, which, when run by the processor, implements the steps of the robot navigation method of any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that a computer program is stored, which, when being executed by a processor, implements the steps of the robot navigation method according to any one of claims 1 to 8.

Description

Robot navigation method, electronic equipment and readable storage medium Technical Field The present invention relates to the field of robotics, and in particular, to a method for navigating a robot, an electronic device, and a readable storage medium. Background With the development of artificial intelligence and robot technology, by combining computer vision and natural language processing, a robot can interact with visual information in an environment according to language instructions, so that target positioning and path planning are realized. The existing navigation mode is to analyze object and landmark information in natural language instructions, identify targets in images by utilizing a visual-language model, generate language reasoning priori, and then combine the object position and camera track by combining the environmental space position and depth information provided by a SLAM system through graph optimization so as to realize navigation. However, in a conventional scene, multiple instances of objects in the same class usually exist, and the conventional robot cannot accurately distinguish the multiple instances, so that the success rate of tasks is reduced. Disclosure of Invention The application aims to provide a robot navigation method, electronic equipment and a readable storage medium, which realize accurate identification and distinction of objects in the navigation process, so that a robot can accurately and efficiently execute tasks in a complex scene. To achieve the above object: in a first aspect, an embodiment of the present application provides a robot navigation method, including: Receiving a natural language instruction; Analyzing the natural language instruction through a natural language processing model to obtain a structured navigation element, wherein the navigation element comprises a plurality of landmark points, a plurality of landmarks, a relative motion relationship between the landmark points and a relative spatial relationship between the landmark points and the landmarks; Constructing a navigation factor graph according to the navigation factor system, wherein the navigation factor graph comprises nodes representing the landmark points, nodes representing the landmarks and constraint edges constructed based on the relative motion relationship and the relative spatial relationship; Acquiring an environment image of an environment where the robot is located in real time, identifying an object in the environment image, and carrying out semantic matching on the identified object and a landmark in the navigation factor graph; and optimizing the navigation factor graph according to the matching result, and controlling the robot to navigate based on the optimized navigation factor graph. In an embodiment, the parsing the natural language instruction by a natural language processing model to obtain a structured navigation element includes: performing sensitive entity identification and desensitization processing on the natural language instruction locally, and replacing the identified sensitive vocabulary with abstract placeholders to obtain a desensitization instruction; sending the desensitization instruction to a natural language processing model of the cloud for analysis; After the analysis result is obtained, recovering the abstract placeholder in the analysis result to the original semantic according to the mapping relation established during desensitization locally so as to obtain the structured navigation element. In one embodiment, the desensitizing treatment comprises: Identifying sensitive vocabulary in the natural language instruction through named entity identification or rule matching; And replacing the identified sensitive vocabulary with a predefined abstract placeholder, and generating a local mapping table for recording the replacement relation. In an embodiment, the semantic matching of the identified object with the landmark in the navigation factor graph includes: Extracting a visual feature vector of an object in the environment image and a text feature vector corresponding to the landmark; Calculating the similarity between the visual feature vector and the text feature vector; and if the similarity exceeds a preset threshold, judging that the matching is successful. In an embodiment, when the environmental image includes at least two similar objects, the method further includes: calculating the mahalanobis distance between the at least two similar objects based on the visual feature vectors of the at least two similar objects; And dynamically adjusting the quantity and association relation of the corresponding landmark nodes in the navigation factor graph according to the mahalanobis distance. In an embodiment, the dynamically adjusting the number and the association relationship of the corresponding landmark nodes in the navigation factor graph according to the mahalanobis distance includes: If the mahalanobis distance is larger than or equal to a