BR-102024017888-A2 - SYSTEM AND PROCESS FOR IDENTIFYING THE POSITION OF OBJECTS IN THREE-DIMENSIONAL COORDINATES

BR102024017888A2BR 102024017888 A2BR102024017888 A2BR 102024017888A2BR-102024017888-A2

Abstract

The invention consists of a computer vision system and process capable of identifying the three-dimensional position in coordinates (x, y, z) of a specific point on an object, using images captured by two or more cameras with different viewing angles of the target object, connected to processing hardware. After the object's position is identified, this information is sent to a robotic device, or any device related to the field of automation that needs to obtain the object's position. The three-dimensional coordinates (x, y, z) of the target object will be relative to the position of the automation/robotic device, simplifying the programming of this robot as much as possible. This system is related to the technical areas of Computer Vision, Machine Learning, Artificial Intelligence, Robotics, and Automation.

Inventors

JOÃO MARCOS SOUZA PENA
RAMON LELIS SARAIVA

Assignees

RAMON LELIS SARAIVA

Dates

Publication Date: 20260317
Application Date: 20240830

Claims (9)

1. SYSTEM FOR IDENTIFYING THE POSITION OF OBJECTS IN THREE-DIMENSIONAL COORDINATES characterized by comprising at least one observation point related to at least one observed object, comprising: (i) at least two image recognition cameras connected to processing hardware; (ii) at least one robotic device, for performing tasks using the three-dimensional coordinates (x, y, z axes) of the identified target object, connected to processing hardware; (iii) graphical user interface (GUI) for selecting existing models, training new models, selecting the point of interest and calibrating the system.
2. SYSTEM FOR IDENTIFYING THE POSITION OF OBJECTS IN THREE-DIMENSIONAL COORDINATES, according to claim 1, characterized in that the image capture cameras (i) can be selected from a group comprising any models and specifications, such as: video cameras or lower-cost digital cameras, cameras with lenses specifically for very small objects, cameras with high FPS, cameras with wireless connection.
3. SYSTEM FOR IDENTIFYING THE POSITION OF OBJECTS IN THREE-DIMENSIONAL COORDINATES, according to claim 1, characterized in that the automation/robotic device (ii) can be selected from a group comprising industrial models, preferably: articulated, Cartesian, cylindrical, polar, SCARA, delta and collaborative.
4. THREE-DIMENSIONAL COORDINATE OBJECT POSITION IDENTIFICATION SYSTEM, according to claim 1, characterized in that the graphical user interface (GUI) (iii) may comprise a screen for selecting different instance segmentation or key point detection models to segment different types of objects or obtain different points of interest; a screen for training new instance segmentation or key point detection models; a screen for selecting the point of interest; a screen for initiating system calibration.
5. PROCESS FOR IDENTIFYING THE POSITION OF OBJECTS IN THREE-DIMENSIONAL COORDINATES, performed by the system defined in any of the preceding claims, characterized by comprising the steps: - Calibrating the system; - Capturing the images by means of each of the cameras (1); - Inserting each of the images into an instance segmentation model (2) capable of segmenting the object of interest, then identifying the point of interest of the segmented object (3); Optionally, insert each of the images into a keypoint detection model (4); - Select one of the target objects detected in the Camera 1 image (5), in cases where there are two or more identical objects in the scene; - Insert the point of interest of the chosen object in the Camera 1 image, together with the points of interest of all objects detected in the Camera 2 image into a model that defines which of the points detected in the Camera 2 image refers to the chosen point in the Camera 1 image (6), in cases where there are two or more identical objects in the scene; - Insert the point of interest of the chosen object in the Camera 1 image, together with the point of interest of the object in the Camera 2 image obtained by the model in the previous step, into another model that will correlate these points with the three-dimensional point of the object (7); - Send the three-dimensional point obtained by the model to the robotic device or any other automation device (8).
6. PROCESS FOR IDENTIFYING THE POSITION OF OBJECTS IN THREE-DIMENSIONAL COORDINATES, according to claim 5, characterized in that the process is initiated in a new cycle instantaneously after the end of the previous cycle, so that the system continuously sends the position of the target object to the automation/robotic device.
7. PROCESS FOR IDENTIFYING THE POSITION OF OBJECTS IN THREE-DIMENSIONAL COORDINATES, according to claim 5, characterized in that the system calibration is performed whenever there is a change in the positioning of the cameras.
8. PROCESS FOR IDENTIFYING THE POSITION OF OBJECTS IN THREE-DIMENSIONAL COORDINATES, according to claim 5, characterized in that the system calibration comprises the following steps: - Positioning the cameras at different viewing angles, so that the point of interest of the target object is visible in the images throughout the operating area of the automation/robotic device; - Connecting the automation/robotic device and the cameras to a computer or any hardware capable of performing the processing; - Attaching an easily distinguishable object to the automation/robotic device; - Move the automation/robotic device throughout its area of operation, capturing, along the entire path, the three-dimensional position of the robot, the position of the object attached to the robot in the image from Camera 1 and in the image from Camera 2; - Train a machine learning model with the obtained data, correlating the position of the object attached to the robot in the images from Cameras 1 and 2 with the three-dimensional position of the robot, with the position in the images from Cameras 1 and 2 being the model's inputs and the three-dimensional position being the model's output. - Train a second machine learning model, in cases where there are two identical target objects in the scene, to match the point identified in the image from Camera 1 with the points identified in the image from Camera 2, using as inputs the point of interest in the image from Camera 1, the corresponding point of interest in the image from Camera 2, in addition to points identified in the images from Camera 2 referring to other positions of the robot, which will be chosen randomly; For the model output, the point of interest in the Camera 2 image will be used, which corresponds to the point of interest in the Camera 1 image that was used as input.
9. PROCESS FOR IDENTIFYING THE POSITION OF OBJECTS IN THREE-DIMENSIONAL COORDINATES, according to claims 5 and 8, characterized by the positioning of the cameras being carried out in such a way that the viewing angle of one camera is as different as possible from the viewing angle of the other camera.

Description

FIELD OF THE INVENTION [001] The invention consists of a computer vision system and process capable of identifying the position in three-dimensional coordinates (x, y, z) of a specific point on an object, using images captured by two cameras with different viewing angles of the target object, connected to processing hardware. After the object's position is identified, this information is sent to a robotic device, or any device related to the field of automation that needs to obtain the object's position. The three-dimensional coordinates (x, y, z) of the target object will be relative to the position of the automation/robotic device, simplifying the programming of this robot as much as possible. This system is related to the technical areas of Computer Vision, Machine Learning, Artificial Intelligence, Robotics, and Automation. FUNDAMENTALS OF THE INVENTION [002] Identifying the position of an object in space is fundamental for a robotic device to act on that object to perform a task automatically. In many tasks, the target object has a fixed and predefined position; in these cases, it is not necessary to use any type of sensor to identify the object's position in space. Conversely, in many cases, the position of the target object is not fixed and predetermined, requiring some type of sensor to identify its position in space. Currently, the most common methods in robotics for identifying the position of an object in space are based on Time-of-Flight (ToF) sensors and stereo vision cameras. [003] A ToF sensor is a type of sensor that can perform three-dimensional mapping using infrared light waves. This technology can calculate the distance to a given point in space by emitting infrared light waves and measuring the time it takes for this light signal to return to its receiver. Therefore, ToF technology uses specific sensors to perform this three-dimensional mapping, unlike the technology in question that uses common cameras. Furthermore, a ToF sensor cannot identify the position of a specific object in space; it only performs three-dimensional mapping of a region. For a ToF sensor to identify the position of a specific object, additional programming is required, and it is not effective at recognizing certain objects in some environments. LiDAR sensors, widely used in robotics, are a type of ToF sensor. [004] Another type of sensor widely used in the field of robotics are Stereo Vision Cameras, which, through two cameras, reproduce the image of a scene, generating a depth map of that scene, that is, indicating the depth of each point in this image. However, like ToF sensors, Stereo Vision Cameras do not provide the three-dimensional coordinates (x, y, z) of a specific point in space, but rather the three-dimensional mapping of a region. In addition, this type of sensor usually uses an infrared light projector in conjunction to increase its accuracy. [005] Thus, the two technologies described have limitations in effectively identifying three-dimensional coordinates of a specific point on an object in certain environments. ToF sensors operate based on the emission and reception of infrared light. For this reason, they can suffer interference from sunlight, resulting in low sensor accuracy in open environments during the day. This makes the use of this sensor unfeasible in some applications. Stereo Vision Cameras, on the other hand, do not show a significant drop in performance when used in environments with sunlight, as the infrared light used in this type of camera is only useful for improving the system's accuracy and is not essential for its operation. However, the accuracy of Stereo Vision Cameras is much lower than the accuracy of ToF sensors in ideal operating environments. [006] The system reported in this patent application, in turn, has a similar operating principle to Stereo Vision Cameras, but with high precision. Prior art Stereo Vision Cameras calculate the depth of a scene from two cameras that have very similar viewing angles, i.e., the cameras are aligned horizontally and have only a certain distance between them. This makes it easier for their system to identify the image matching points, but does not provide good precision because the images have very similar viewing angles. The system and process of this patent application, however, can be used with any cameras and in any positions, as long as the point of interest is visible in the images from the cameras used. This flexibility allows the user to position the cameras at very different viewing angles, resulting in high precision. Furthermore, the system provides flexibility for the user to choose the most suitable camera type for the project, whether it's a camera with optical zoom to identify the position of very small objects, a camera with a high FPS (Frames per Second) to identify the position of fast-moving objects, or a camera with a wide angle of view for environments where it's not possible to position the camera further away from th