CN-121982111-A - Data labeling method, data labeling device, electronic equipment and program product

CN121982111ACN 121982111 ACN121982111 ACN 121982111ACN-121982111-A

Abstract

The application is suitable for the technical field of computer vision, and provides a data labeling method, a data labeling device, electronic equipment and a program product. The data labeling method comprises the steps of determining visible surfaces in six geometric surfaces of a cuboid object in a color image, determining four first vertex coordinates of the visible surfaces under a camera coordinate system, determining first pose information of the visible surfaces based on the four first vertex coordinates, wherein the first pose information refers to pose information of the visible surfaces under the camera coordinate system, the four first vertex coordinates and the first pose information are labeling data of the visible surfaces, and the labeling data of the visible surfaces are used for labeling the visible surfaces in a target image, and the target image at least comprises the color image. The application can solve the problems of low manual marking precision, poor consistency, low efficiency and the like.

Inventors

ZHANG ZHUMING

Assignees

深圳市优必选科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20251218

Claims (15)

1. A method for labeling data, comprising: Determining visible surfaces in six geometrical surfaces of a cuboid object in a color image; determining four first vertex coordinates of the visible face in a camera coordinate system; Determining first pose information of the visible surface based on the four first vertex coordinates, wherein the first pose information refers to pose information of the visible surface under the camera coordinate system; The four first vertex coordinates and the first pose information are marking data of the visible surface, the marking data of the visible surface are used for marking the visible surface in a target image, and the target image at least comprises the color image.
2. The method of claim 1, wherein the labeling data of the visible face further comprises a pixel level mask of the visible face, and wherein the method further comprises: determining pixels existing in the visible face and in the whole mask of the cuboid object as actual visible pixels of the visible face; generating a pixel level mask of the visible face based on actual visible pixels of the visible face; The whole mask is labeling data of the cuboid objects, and the labeling data of the cuboid objects are used for labeling the cuboid objects in the target image.
3. The data labeling method of claim 2, further comprising, prior to determining pixels that are simultaneously present in the visible face and in the entire mask of the rectangular parallelepiped object as actual visible pixels of the visible face: Calculating a first absolute difference value between the actual depth of each pixel in the visible plane and the corresponding theoretical depth; if the first absolute difference value is smaller than or equal to a first difference value threshold value, determining the pixel as an unoccluded pixel in the visible surface; the determining pixels that exist in the visible face and the whole mask of the cuboid object at the same time as actual visible pixels of the visible face comprises: pixels that are present in all non-occluded pixels at the same time and in the global mask are determined as actual visible pixels of the visible face.
4. The data labeling method of claim 2, wherein the target image further comprises a depth image corresponding to the color image, and wherein the data labeling method further comprises, in the case where the target image is obtained by simulation: Acquiring second pose information of the color image, the depth image, the integral mask and the cuboid object through one-time multi-channel rendering in a simulation environment, wherein the second pose information refers to pose information of the cuboid object under the camera coordinate system; Before determining the visible face of the six geometrical faces of the cuboid object in the color image, further comprises: Converting the second pose information and the size information of the cuboid object into vertex coordinates under a cuboid object coordinate system to obtain eight second vertex coordinates of the cuboid object; Converting the eight second vertex coordinates to the camera coordinate system, and projecting the eight second vertex coordinates to an image plane of the color image to obtain the six geometric surfaces; the size information and the second pose information of the cuboid objects are labeling data of the cuboid objects, and the labeling data of the cuboid objects are used for labeling the cuboid objects in the target image.
5. The method of claim 4, wherein for at least one solid surface of the six geometric surfaces, determining a visible surface of the at least one solid surface comprises: for each solid surface, calculating an included angle between a first normal vector of the solid surface under the camera coordinate system and the camera sight line direction; And if the included angle between the first normal vector and the camera sight line direction is smaller than or equal to a first angle threshold value, determining the solid surface as the visible surface.
6. The method for labeling data according to claim 2, wherein in the case where the visible surface is an open surface, the determining manner of the pixel level mask of the open surface further includes: Calculating a second absolute difference value between the actual depth of each pixel in the open surface and the corresponding theoretical depth; if the second absolute difference value is smaller than a second preset difference value, determining the pixel as an actual visible pixel of the open surface; A pixel level mask of the open face is generated based on actual visible pixels of the open face.
7. The method according to claim 2, wherein the target image further includes a depth image corresponding to the color image, and wherein in a case where the target image is captured by a camera, the method further includes: Performing example segmentation on the color image to obtain an initial mask of the cuboid object; detecting whether an outlier exists in the initial mask based on the depth image; If the outliers exist in the initial mask, eliminating the outliers from the initial mask to obtain the whole mask; and if the outliers do not exist in the initial mask, determining the initial mask as the whole mask.
8. The method according to claim 2, wherein the target image further includes a depth image corresponding to the color image, and wherein the determining the visible surface of the six geometric surfaces of the rectangular object in the color image when the target image is captured by the camera includes: converting the image coordinates of the pixels in the integral mask and the corresponding depth in the depth image into an initial point cloud; Determining the initial point cloud as a candidate point cloud; Fitting a maximum plane based on the candidate point clouds, and removing point clouds belonging to the maximum plane in the candidate point clouds to obtain residual point clouds; judging whether the number of the residual point clouds is smaller than the first number or whether the number of the fitted planes reaches the second number; If the number of the residual point clouds is greater than or equal to the first number and the number of the fitted planes does not reach the second number, determining the residual point clouds as candidate point clouds, and returning to the step of executing the step of fitting a maximum plane based on the candidate point clouds and the subsequent steps until the number of the residual point clouds is smaller than the first number or the number of the fitted planes reaches the second number; And if the number of the residual point clouds is smaller than the first number, or the number of the fitted planes reaches the second number, determining all the fitted planes as the visible surface.
9. The method according to any one of claims 1 to 8, wherein the number of the color images is a plurality of frames, the plurality of frames of the color images being obtained by photographing the rectangular object from different angles of view by a camera, and further comprising, after determining the visible face among six geometric faces of the rectangular object in the color images: clustering the visible faces of the multi-frame color images to obtain visible faces belonging to the same geometric face; Establishing a minimum reprojection error problem by taking size information of the cuboid objects and third pose information of the cuboid objects as variables to be optimized, taking observation data of all visible surfaces as constraints, wherein the minimum reprojection error problem comprises an in-plane point geometric constraint and a vertex consistency constraint, the in-plane point geometric constraint is used for minimizing a distance error between points in each visible surface belonging to the same geometrical surface and the same geometrical surface after being projected to a cuboid object coordinate system, the vertex consistency constraint is used for enabling three-dimensional coordinates of the same vertex under different view angles under a world coordinate system to be consistent, the observation data of one visible surface comprises a first plane parameter of the visible surface and three-dimensional coordinates of the points in the visible surface under the world coordinate system, the first plane parameter comprises a second normal vector under the world coordinate system and a distance from a first origin, the first origin is the origin under the world coordinate system, and the third pose information refers to the world pose information of the cuboid objects under the world coordinate system; Solving the minimized re-projection error problem to obtain the size information and the third pose information of the cuboid object; converting the third pose information into the camera coordinate system based on fourth pose information of the camera under different visual angles to obtain second pose information of the cuboid object in the multi-frame color image, wherein the fourth pose information is pose information of the camera under the world coordinate system, and the second pose information is pose information of the cuboid object under the camera coordinate system; The size information and the second pose information of the cuboid objects are labeling data of the cuboid objects, and the labeling data of the cuboid objects are used for labeling the cuboid objects in the target image under the corresponding view angles.
10. The method for labeling data according to claim 9, further comprising, before said clustering visible faces of said plurality of frames of said color image to obtain visible faces belonging to the same geometric face: Converting second plane parameters of all the visible surfaces into the world coordinate system to obtain the first plane parameters of all the visible surfaces, wherein the second plane parameters comprise a third normal vector under the camera coordinate system and a distance from a second origin, and the second origin is the origin under the camera coordinate system; Calculating a third absolute difference of the distance from the first origin to an included angle between the second normal vectors of the visible surfaces at any two different viewing angles; Clustering the visible faces of the multi-frame color images to obtain visible faces belonging to the same geometric face, wherein the clustering comprises the following steps: And clustering the visible faces of the multi-frame color images based on the rule that the included angle between the second normal vectors is smaller than a second angle threshold and the third absolute difference value of the distance from the second normal vectors to the first origin is smaller than a third difference value threshold, so as to obtain the visible faces belonging to the same geometric face.
11. The method according to claim 9, further comprising, after obtaining the second pose information of the rectangular solid object in the plurality of frames of the color image: And for the second pose information of the cuboid object in any color image, projecting the second pose information to an image plane of the color image to obtain the six geometric surfaces of the cuboid object in the color image.
12. The method according to any one of claims 1 to 8, wherein the determining the first pose information of the visible face based on the four first vertex coordinates includes: Calculating the center point coordinates of the visible face based on the four first vertex coordinates; Calculating two adjacent edge vectors of the visible surface based on three vertex coordinates continuously adjacent to each other in the four first vertex coordinates; the two adjacent side vectors are multiplied in a crossing way to obtain a third normal vector of the visible surface; After unitizing the third normal vector, constructing a rotation matrix of the visible face; The center point coordinates of the visible face and the rotation matrix constitute the first pose information.
13. A data tagging device, comprising: A first determining module for determining visible faces of six geometrical faces of a cuboid object in a color image; A second determining module, configured to determine four first vertex coordinates of the visible surface in a camera coordinate system; The third determining module is used for determining first pose information of the visible surface based on the four first vertex coordinates, wherein the first pose information refers to pose information of the visible surface under the camera coordinate system; The four first vertex coordinates and the first pose information are marking data of the visible surface, the marking data of the visible surface are used for marking the visible surface in a target image, and the target image at least comprises the color image.
14. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein execution of the computer program by the processor causes the electronic device to implement the data tagging method of any one of claims 1 to 12.
15. A computer program product comprising a computer program which, when run, causes the data annotation method as claimed in any one of claims 1 to 12 to be performed.

Description

Data labeling method, data labeling device, electronic equipment and program product Technical Field The application belongs to the technical field of computer vision, and particularly relates to a data labeling method, a data labeling device, electronic equipment and a program product. Background In industrial scenes such as storage, sorting, loading and unloading, cuboid objects such as open plastic boxes, cartons, desktops and trays are key objects with most frequent contact and most quantity when a robot walks and carries. The robot is required to complete the actions of grabbing, placing, stacking, avoiding obstacles and the like in an unstructured environment, and the objects must be precisely positioned in the centimeter level or even in the millimeter level. When such objects are precisely positioned, large-scale, high-precision and high-consistency labeling data are required. At present, manual labeling is mainly adopted, but the problems of low precision, poor consistency, low efficiency and the like exist. Disclosure of Invention The embodiment of the application provides a data labeling method, a data labeling device, electronic equipment and a program product, which can solve the problems of low manual labeling precision, poor consistency, low efficiency and the like. In a first aspect, an embodiment of the present application provides a data labeling method, including: Determining visible surfaces in six geometrical surfaces of a cuboid object in a color image; determining four first vertex coordinates of the visible face in a camera coordinate system; Determining first pose information of the visible surface based on the four first vertex coordinates, wherein the first pose information refers to pose information of the visible surface under the camera coordinate system; The four first vertex coordinates and the first pose information are marking data of the visible surface, the marking data of the visible surface are used for marking the visible surface in a target image, and the target image at least comprises the color image. In the embodiment of the application, the visible surface of the color image in the six geometric surfaces of the cuboid object is determined, four first vertex coordinates of the visible surface under a camera coordinate system can be determined, and the first pose information of the visible surface can be determined based on the four first vertex coordinates, so that the labeling data comprising the four first vertex coordinates and the first pose information can be automatically generated, the whole flow does not need manual interaction, the problems of low manual labeling precision, poor consistency and low efficiency are solved, and millimeter-level precision labeling under the scale of millions of images can be realized. In a second aspect, an embodiment of the present application provides a data labeling apparatus, including: A first determining module for determining visible faces of six geometrical faces of a cuboid object in a color image; A second determining module, configured to determine four first vertex coordinates of the visible surface in a camera coordinate system; The third determining module is used for determining first pose information of the visible surface based on the four first vertex coordinates, wherein the first pose information refers to pose information of the visible surface under the camera coordinate system; The four first vertex coordinates and the first pose information are marking data of the visible surface, the marking data of the visible surface are used for marking the visible surface in a target image, and the target image at least comprises the color image. In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to cause the electronic device to implement the data labeling method as described in the first aspect. In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a computer to implement the data labeling method according to the first aspect. In a fifth aspect, an embodiment of the present application provides a computer program product comprising a computer program which, when run, causes the data annotation method according to the first aspect described above to be performed. It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again. Drawings In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that