JP-7855905-B2 - System and method for an improved camera system using filters and machine learning to estimate depth

JP7855905B2JP 7855905 B2JP7855905 B2JP 7855905B2JP-7855905-B2

Inventors

ショーンピー．ロドリゲス

Assignees

トヨタ自動車株式会社

Dates

Publication Date: 20260511
Application Date: 20220413
Priority Date: 20210413

Claims (18)

The processor has memory that is communicatively coupled to it. The aforementioned memory is An acquisition module, which includes instructions to cause the processor to acquire image data from the detector, which uses a lens to change multiple angles of light for each area of the detector, when executed by the processor, When executed by the aforementioned processor, the processor: The kernel is mapped to the image data according to the size of the view and kernel associated with the area. A decision module that includes instructions for processing the image data using an ML (machine learning) model to create depth according to the size of the kernel, and stores the following : The determination module uses the ML model to classify objects in the scene related to the image data, A camera system further comprising instructions for generating a distribution of spatial points including the object related to the depth .
The camera system according to claim 1, wherein the lens has a gradient that gradually changes or is configured to change the plurality of angles of the light related to the area of the detector.
The camera system according to claim 1, wherein the lens includes one or more filter elements for directing the light to change the plurality of angles for each pixel or quadrant of the detector.
The camera system according to claim 1, wherein the image data includes multiple overlapping views from various angles that vary according to a parameter defined for depth in relation to any one of the refraction, filtering, and orientation of the light.
The camera system according to claim 1, wherein the acquisition module includes commands for acquiring the image data, and further includes commands for processing the light using a resonant waveguide grating (RWG) operably connected to the lens, and for changing the plurality of angles of the light.
The camera system according to claim 5, wherein the acquisition module further includes instructions for transmitting the light to the area of the detector at a predetermined angle and wavelength according to the bandwidth of the RWG.
The camera system according to claim 1, wherein the area corresponds to one or more pixels of the detector associated with the plurality of angles.
When executed by a processor, the processor Image data is acquired from the detector, which uses lenses to change multiple angles of light for each area of the detector, according to a reference. The kernel is mapped to the image data according to the size of the view and kernel associated with the area. The image data is processed using an ML (machine learning) model to create depth according to the size of the kernel . Using the aforementioned ML model, objects in the scene related to the image data are classified. A non-temporary, computer-readable medium containing instructions for generating a distribution of spatial points including the object related to the depth .
The non-transient computer-readable medium according to claim 8 , wherein the lens has a gradient that becomes gentler or is configured to change the plurality of angles of the light related to the area of the detector.
The non-transient computer-readable medium according to claim 8 , wherein the lens includes one or more filter elements for directing the light to change the plurality of angles for each pixel or quadrant of the detector.
The non-temporary computer-readable medium according to claim 8 , wherein the image data includes a plurality of overlapping views from various angles that vary according to a parameter defined for depth in relation to any one of the refraction, filtering, and orientation of the light.
Image data is acquired from the detector according to a reference, using a lens to change multiple angles of light for each area of the detector, Mapping the kernel to the image data according to the size of the view and kernel associated with the area, The image data is processed using a machine learning (ML) model to create depth according to the size of the kernel, Using the aforementioned ML model, classify objects in the scene related to the image data, To generate a distribution of spatial points including the object related to the depth, Methods that include...
The method according to claim 12 , wherein the lens has a gradient that becomes gentler or is configured to change the plurality of angles of the light related to the area of the detector.
The method according to claim 12 , wherein the lens includes one or more filter elements for directing the light to change the plurality of angles by the pixels or quadrants of the detector.
The method according to claim 12 , wherein the image data includes a plurality of overlapping views from various angles that vary according to a parameter defined for depth in relation to any one of the refraction, filtering, and orientation of the light.
The method according to claim 12, further comprising processing the light using a resonant waveguide grating ( RWG ) operably connected to the lens to change the plurality of angles of the light.
The method according to claim 16 , further comprising transmitting the light to the area of the detector at a predetermined angle and wavelength according to the bandwidth of the RWG.
The method according to claim 12 , wherein the area corresponds to one or more pixels of the detector associated with the plurality of angles.

Description

The subject matter described herein generally relates to camera systems, more specifically to improved camera systems including directional optics, and to machine learning models for estimating depth. Vehicles may be equipped with sensors that facilitate the perception of other vehicles, obstacles, pedestrians, and additional aspects of the surrounding environment. For example, a vehicle may be equipped with a LiDAR (light detection and ranging) sensor that uses light to scan the surrounding environment, and the logic circuitry associated with the LiDAR analyzes the acquired data to detect the location of objects and other features of the scene. In further examples, additional/alternative sensors, such as camera systems, may be implemented to acquire information about the surrounding environment, from which the system derives an understanding of the surrounding environment. This sensor data can be useful in various environments to improve the perception of the surrounding environment, enabling systems such as autonomous driving systems to understand the recorded situation, make accurate plans, and navigate accordingly. Generally, the more a vehicle develops its perception of its surroundings, the more it can complement the driver with information to assist driving, and/or the autonomous system can better control the vehicle to avoid danger. Systems that use LiDAR to detect objects are best suited for long distances. Therefore, vehicles may use pseudo-LIDAR systems to detect objects using images processed by systems that use multiple cameras and sensors for both short and long distances. However, pseudo-LIDAR systems that rely on multiple cameras and sensors can introduce computational complexity. Furthermore, pseudo-LIDAR systems can use images that change in time and space to create spatial point distributions or point clouds related to estimated depth, similar to LiDAR systems. Systems that process image data for accurate spatial point distributions can sometimes be complex. Furthermore, pseudo-LIDAR systems may acquire images from multiple cameras to estimate depth. Pseudo-LIDAR systems may also modify images from multiple cameras using machine processing to find image overlaps. For example, image overlaps may be stereoscopic images with two or more images sharing corresponding image points. However, pseudo-LIDAR systems that search for image overlaps are time-consuming and computationally intensive. In one embodiment, the example of a system and method relates to an improved pseudo-light detection and ranging method using an improved camera system including directional optical components and a machine learning (ML) model for depth estimation. In various implementations, pseudo-LIDAR systems are computationally intensive when accurately detecting objects in a scene, as they combine data from multiple sensors or cameras to create a spatial point distribution. Furthermore, the hardware of pseudo-LIDARs using multiple sensors can increase the size of the components, processing tasks, and latency for depth estimation. Thus, pseudo-LIDAR systems can encounter difficulties in efficiently and accurately estimating depth, leading to frustration. Therefore, in one embodiment, the camera system reduces the computation required to estimate the depth of the scene by using an ML model, hardware, and input from limiting sensors to vary the angle of the light wave relative to the image. The output of the camera system may be wide-field-of-view image data due to the combination of redundant information in the scene. The system may vary the angle of the light wave according to lens parameters optimized for depth estimation. In addition, the system may use an ML model to reduce computation by processing a portion of the image data generated by the lens and detector. The system may use the processed image data from the ML model to classify and estimate the depth of objects in the scene near the camera system. Furthermore, the camera system may vary a specific angle of the light wave to subsequently estimate depth, using inverted or graded lenses and independent filtering for each pixel of the detector. To improve object detection, the camera system may redirect the light wave associated with the reduced-resolution image for each pixel of the detector array. The output of the camera system may be improved image data containing the object, simplifying subsequent ML or tasks for depth estimation. In addition, the camera system may use inverting or shallow-gradient lenses to filter light related to an object by dividing the detector into regions, and then vary a specific angle of the light wave to estimate the depth. The regions may relate to one or more pixels. For example, the camera system may use quadrant filtering with a detector divided into quadrants representing different focal areas, and then vary the angle of the light wave related to the image. Vehicles may be equipped with camera systems that use pixel-by-pixel or