CN-121982237-A - Method and device for constructing visual semantic map of intelligent robot with body

CN121982237ACN 121982237 ACN121982237 ACN 121982237ACN-121982237-A

Abstract

The invention discloses a building method and device of a visual semantic map of an intelligent robot with a body, the method comprises the steps of obtaining a target 3D point P f in a visual semantic map point cloud, determining whether all corresponding visible images in an environment image or a video frame collected by the robot are in a projection mode based on the P f , calculating an included angle alpha i between a vector of a center point of the P f -i visible image and an image normal n i of the i visible image, screening the visible images based on a size relation between the included angle alpha i and a preset angle threshold value to obtain a screened first image set, determining an optimal projection area based on the width h i and the height w i of the i visible image in the first image set, judging whether the target 3D point P f is projected onto the i visible image to obtain an optimal projection area of the projection point P i , screening the visible images in the first image set based on a judgment result to obtain a screened second image set, and determining a label category corresponding to the P f based on the second image set. The construction of the visual semantic map of the intelligent robot with body is realized.

Inventors

YOU QINGZHEN

Assignees

红象科技(北京)有限公司

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. The method for constructing the visual semantic map of the intelligent robot with the body is characterized by comprising the following steps of: Acquiring a target 3D point P f in a visual semantic map point cloud, and determining all corresponding visible images in an environment image or a video frame acquired by a robot based on the P f , wherein each visible image has a preset width h i and a preset height w i , and a vector passing through the center point of each visible image and perpendicular to the image plane direction is defined as an image normal n i ; Calculating an included angle alpha i between the vector from the P f to the center point of the ith visible image and an image normal n i of the ith visible image, and screening the visible image based on the magnitude relation between the included angle alpha i and a preset angle threshold value to obtain a screened first image set; Determining an optimal projection area based on the width h i and the height w i of the ith visible image in the first image set, judging whether a projection point P i falls into the optimal projection area of the ith visible image or not by projecting a target 3D point P f onto the ith visible image; And determining the label category corresponding to the P f based on the second image set.
2. The method for constructing a visual semantic map of an intelligent robot with body according to claim 1, wherein the step of screening the visual image based on the magnitude relation between the included angle α i and a preset angle threshold value comprises: And if the included angle alpha i is more than or equal to a preset angle threshold, filtering the visible image, and reserving the visible image with the included angle alpha i smaller than the preset angle threshold, and recording the visible image as a first screened image set.
3. The method of claim 1, wherein determining the optimal projection area based on the width h i and the height w i of the ith visible image in the first set of images comprises: The reduction ratio beta is set, the left side and the right side of the ith visible image are respectively contracted by w i multiplied by beta towards the center of the image, and the top and the bottom are respectively contracted by h i multiplied by beta towards the center of the image on the basis of the width h i and the height w i of the ith visible image, so that the optimal projection area is formed.
4. The method for constructing a visual semantic map of an intelligent robot with a body according to claim 1, wherein the step of screening the visible images in the first image set based on the judgment result to obtain the screened second image set comprises: And if the projection point p i does not fall into the optimal projection area of the image, filtering the visible image, screening the image of which the projection point p i falls into the corresponding optimal projection area from the obtained first screened image set, and marking the image as a second screened image set.
5. The method for constructing a visual semantic map of an intelligent robot with a body according to claim 1, wherein based on the preset semantic tags corresponding to the projection points P i of each image in the second filtered image set, summarizing the semantic score of each tag class according to the tag class, selecting the semantic tag with the largest score as the tag of P f comprises: carrying out semantic segmentation processing on each image in the second screened image set, and endowing each pixel in the image with a preset semantic label category set { c1, c2, & gt, cx }, so as to obtain a semantic label corresponding to a projection point P i of the P f in each visible image; Traversing l images in the second screened image set aiming at each semantic label cy in the preset semantic label category set, and summarizing contribution scores corresponding to the images with the projection point p i labeled cy to obtain semantic scores Scorecyf of the cy.
6. The method for constructing visual semantic map of intelligent robot with body according to claim 5, wherein the contribution of the projection point p i labeled cy corresponding to the image is divided into ; The semantic score of cy is as follows 。
7. The method for constructing a visual semantic map of an intelligent robot with body according to claim 1, wherein the preset angle threshold range is between [70 °,90 ° ].
8. A method of constructing a visual semantic map of an intelligent robot with a body according to claim 3, wherein the scale of indentation β is adaptively adjusted according to the resolution of the visual image, and the adjustment range is defined within [1%,20% ].
9. The utility model provides a have intelligent robot vision semantic map's construction device which characterized in that includes The preprocessing unit is used for acquiring a target 3D point P f in the visual semantic map point cloud, and determining all corresponding visible images in an environment image or a video frame acquired by the robot based on the P f , wherein each visible image has preset width h i and height w i , and a vector passing through the center point of each visible image and perpendicular to the direction of the image plane is defined as an image normal n i ; The first screening unit is used for calculating an included angle alpha i between the vector of the center points of the P f to the ith visible image and the image normal n i of the image, and screening the visible images based on the magnitude relation between the included angle alpha i and a preset angle threshold value to obtain a screened first image set; The second screening unit is used for determining an optimal projection area based on the width h i and the height w i of the ith visible image in the first image set, judging whether the projection point P i falls into the optimal projection area of the ith visible image or not when the target 3D point P f is projected onto the ith visible image, screening the visible images in the first image set based on a judgment result, and obtaining a screened second image set; and the label determining unit is used for determining the label category corresponding to the P f based on the second image set.
10. A computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

Description

Method and device for constructing visual semantic map of intelligent robot with body Technical Field The invention relates to the technical field of information processing, in particular to a method and a device for constructing a visual semantic map of an intelligent robot with a body. Background In the field of self-contained intelligence, a robot needs to construct a semantic map for autonomous positioning navigation, firstly, the environment where the robot is located is modeled, usually in a point cloud mode, then semantic categories of different areas of the model, such as coffee shops, express stations and the like, are marked in a manual mode, and finally, the robot completes positioning navigation among targets based on semantic information of the map. The semantic information of the manual map marking is not efficient in the process, and the cost is high due to the fact that a large amount of manual work is adopted. Disclosure of Invention The invention mainly aims to provide a method and a device for constructing visual semantic maps of intelligent robots with bodies, which are used for solving the defects in the related technology. In order to achieve the above objective, according to a first aspect of the present invention, a construction method of a visual semantic map of a personal intelligent robot is provided, which includes obtaining a target 3D point P f in a visual semantic map point cloud, determining all corresponding visible images in an environmental image or a video frame collected by the robot based on the P f, wherein each visible image has a preset width h i and a preset height w i, defining a vector passing through a center point of each visible image and perpendicular to a plane direction of the image as an image normal n i, calculating an included angle α i between a vector from the P f to the center point of the i visible image and an image normal n i of the i visible image, screening the visible images based on a magnitude relation between the included angle α i and a preset angle threshold, determining an optimal projection area based on a width h i and a height w i of the i visible image in the first image set, determining whether the target 3D point P f is projected onto the i visible image to obtain a projection area, determining whether the best projection area P f falls into the second visible image set based on the second projection result, and screening the best tag set based on the first image. Optionally, screening the visible images based on the magnitude relation between the included angle alpha i and a preset angle threshold value comprises filtering the visible images if the included angle alpha i is larger than or equal to the preset angle threshold value, reserving the visible images with the included angle alpha i smaller than the preset angle threshold value, and recording the visible images as a first screened image set. Optionally, determining the optimal projection area based on the width h i and the height w i of the ith visible image in the first set of images includes setting a scale-down β, shrinking the left and right sides of the ith visible image by w i x β toward the center of the image, respectively, and shrinking the top and bottom by h i x β toward the center of the image, respectively, based on the width h i and the height w i of the image, forming the optimal projection area. Optionally, screening the visible images in the first image set based on the judging result, and obtaining a screened second image set includes filtering the visible image if the projection point p i does not fall into the optimal projection area of the image, screening out the image of the projection point p i falling into the corresponding optimal projection area from the obtained first screened image set, and recording the image as the second screened image set. Optionally, summarizing semantic scores of each label category according to label categories based on preset semantic labels corresponding to projection points P i of each image in the second screened image set, wherein selecting the semantic label with the largest score as the label of P f comprises the steps of carrying out semantic segmentation processing on each image in the second screened image set, endowing each pixel in the image with a preset semantic label category set { c1, c2,..and cx } to obtain the semantic label corresponding to the projection point P i of P f in each visible image, traversing one image in the second screened image set according to each semantic label cy in the preset semantic label category set, and summarizing contribution scores corresponding to images with all projection points P i labeled cy to obtain the semantic score Scorecyf of cy. Optionally, the contribution of the projection point p i, labeled cy, corresponding to the image is divided intoThe semantic score of cy is。 Optionally, the preset angular threshold ranges between [70 °,90 ° ]. Optionally, the retraction ra