US-20260129149-A1 - PROCESSING INTERMEDIATE REPRESENTATION DATA FOR IMAGE VIEWS GENERATED USING STEREO DISPARITY DATA

US20260129149A1US 20260129149 A1US20260129149 A1US 20260129149A1US-20260129149-A1

Abstract

Approaches presented herein provide for generation of alternate views from disparity data captured for one or more objects in a scene. The generation can be performed using an embedded processor with DMA memory access, or other limited capacity hardware. An intermediate representation can be generated that is a 2D histogram view of the disparity data. This intermediate representation can be transformed, using the embedded processor, to an alternate view image, such as a bird's eye view image. Morphological or similar filtering can be performed on the one or more objects in the intermediate representation using the same size filter, regardless of distance from a camera plane used to capture the disparity data.

Inventors

Branislav Kisacanin
Ching-Yu Hung

Assignees

NVIDIA CORPORATION

Dates

Publication Date: 20260507
Application Date: 20241105

Claims (20)

1 . A system, comprising: at least one embedded processor with direct memory access (DMA) functionality to: generate a two-dimensional (2D) histogram view of one or more objects in an environment based in part on disparity data for the one or more objects, the two-dimensional histogram view being a function of angle and distance of at least one camera used to generate the stereo disparity data; select a filter of a single shape and size to be used regardless of respective distances of the individual objects to a camera plane of the at least one camera; perform morphological filtering of the one or more objects in the 2D histogram image using the filter of the specified size; and transform the 2D histogram view, after the morphological filtering, to an alternate view image of the one or more objects.
2 . The system of claim 1 , wherein the 2D histogram view is an intermediate representation, and wherein alternate view images is a bird's eye view image of the one or more objects generated by transforming the intermediate representation.
3 . The system of claim 1 , wherein the at least one embedded processor lacks access to a full set of the disparity data stored in external memory to use in generating the alternate view image.
4 . The system of claim 1 , wherein the system is further to determine the disparity data using image data captured using the at least one camera.
5 . The system of claim 1 , wherein the at least one camera includes at least one of a stereoscopic camera assembly, a pair of matched camera sensors, or a depth sensor.
6 . The system of claim 1 , wherein the alternate view image is generated in part by generating a list of object centroids and statistics using the 2D histogram view and transforming the list into a corresponding list in a coordinate system of the alternate view image.
7 . The system of claim 6 , wherein the object centroids are calculated using locations in the 2D histogram view identified to be associated with the one or more objects using a connected components algorithm with the at least one embedded processor.
8 . The system of claim 1 , wherein the at least one embedded processor is further to use the 2D histogram view to estimate motion of the one or more objects without having to determine a distance of the one or more objects from a camera plane of the at least one camera.
9 . The system of claim 8 , wherein the motion is estimated using an optical flow map with the 2D histogram view using information from a camera view used to generate the disparity data.
10 . The system of claim 1 , wherein the system comprises at least one of: a system for performing simulation operations; a system for performing simulation operations to test or validate autonomous machine applications; a system for performing digital twin operations; a system for performing light transport simulation; a system for rendering graphical output; a system for performing deep learning operations; a system for performing generative AI operations using a large language model (LLM), a system for performing generative AI operations using a vision language model (VLM), a system for performing generative AI operations using a multi-modal language model (MMLM); a system for deploying one or more language models using an operating system (OS)-level virtualization container that communicates with the one or more language models using one or more application programming interfaces (APIs); a system implemented using an edge device; a system for generating or presenting virtual reality (VR) content; a system for generating or presenting augmented reality (AR) content; a system for generating or presenting mixed reality (MR) content; a system incorporating one or more Virtual Machines (VMs); a system implemented at least partially in a data center; a system for performing hardware testing using simulation; a system for synthetic data generation; a collaborative content creation platform for 3D assets; or a system implemented at least partially using cloud computing resources.
11 . At least one embedded processor, with direct memory access (DMA) functionality, to generate an alternate view image by generating, from disparity data for a scene, an intermediate histogram as a function of angle, filtering one or more objects in the intermediate histogram using a single filter size independent of distance from a camera plane, and transforming the intermediate histogram to the alternate view image.
12 . The least one embedded processor of claim 11 , wherein the at least one embedded processor is further to perform a connected components analysis on the intermediate histogram to identify pixel locations associated with the one or more objects.
13 . The at least one embedded processor of claim 12 , wherein the at least one embedded processor is further to generate a list of object centroids and statistics for the one or more objects using the intermediate histogram view, and transform the list into a corresponding list in a coordinate system of the bird's eye view image.
14 . The at least one embedded processor of claim 11 , wherein the at least one embedded processor lacks access to a full set of image data stored in external memory to use in generating the intermediate histogram or the bird's eye view image.
15 . The at least one embedded processor of claim 11 , wherein the filtering includes erosion filtering and dilation filtering of representations in the intermediate histogram of the one or more objects.
16 . The at least one embedded processor of claim 11 , wherein the at least one embedded processor is comprised in at least one of: a system for performing simulation operations; a system for performing simulation operations to test or validate autonomous machine applications; a system for performing digital twin operations; a system for performing light transport simulation; a system for rendering graphical output; a system for performing deep learning operations; a system implemented using an edge device; a system for generating or presenting virtual reality (VR) content; a system for generating or presenting augmented reality (AR) content; a system for generating or presenting mixed reality (MR) content; a system incorporating one or more Virtual Machines (VMs); a system implemented at least partially in a data center; a system for performing hardware testing using simulation; a system for synthetic data generation; a system for performing generative AI operations using a large language model (LLM), a system for performing generative AI operations using a vision language model (VLM), a system for performing generative AI operations using a multi-modal language model (MMLM); a system for deploying one or more language models using an operating system (OS)-level virtualization container that communicates with the one or more language models using one or more application programming interfaces (APIs); a collaborative content creation platform for 3D assets; or a system implemented at least partially using cloud computing resources.
17 . A computer-implemented method, comprising: generating, using an embedded processor with DMA memory access, a two-dimensional (2D) histogram view of one or more objects in an environment based in part on disparity data for the one or more objects, the two-dimensional histogram view being a function of angle of at least one camera used to generate the stereo disparity data; selecting a filter of a single shape and size to be used regardless of respective distances of the individual objects to a camera plane of the at least one camera; performing morphological filtering of the one or more objects in the 2D histogram image using the filter of the specified size; and transforming, using the embedded processor, the 2D histogram view, after the morphological filtering, to an alternate view image of the one or more objects.
18 . The computer-implemented method of claim 17 , wherein the embedded processor lacks access to external memory to use in generating the 2D histogram view or the alternate view image.
19 . The computer-implemented method of claim 17 , further comprising: selecting the filter size based in part on a data transfer limit of the DMA memory access and a resolution of the disparity data.
20 . The computer-implemented method of claim 17 , further comprising: performing, using the embedded processor, a connected components analysis on the intermediate histogram representation to identify the locations associated with the one or more objects; generating, using the embedded processor, a list of object centroids and statistics for the one or more objects from the intermediate histogram representation; and transforming, using the embedded processor, the list into a corresponding list in a coordinate system of the alternate view image.

Description

TECHNICAL FIELD This disclosure relates to the transformation of image data between different views or representations, and in particular in one or more non-limiting embodiments to the generation of an intermediate image representation from a set of disparity data that allows for processing and transformation using limited-capacity resources. BACKGROUND In various computing operations, there is a need to determine the locations of various objects in a scene or geographic region. This can include—for example and without limitation—the analysis of captured image information to support tasks such as navigation, localization, controlled interaction, and collision avoidance for robots and autonomous or semi-autonomous vehicles or machines. Performing operations such as those involving image recognition and computer vision can require significant resource capacity, including the ability to access memory with sufficient capacity to store an entire image. Tasks such as generating a bird's eye view (BEV) representation of a scene from captured disparity data can be difficult, if even possible, to perform using limited capacity resources, such as embedded processors without access to external memory. Further, there are tasks such as morphological filtering and motion analysis that are resource intensive when required to be performed on bird's eye view images where objects at different distances can have different levels of quality or amount of captured information. BRIEF DESCRIPTION OF THE DRAWINGS Various embodiments in accordance with the present disclosure will be described with reference to the drawings, in which: FIGS. 1A, 1B, 1C, and 1D illustrate image views that can be generated from captured image data, according to at least one embodiment; FIG. 2A illustrates an intermediate image that can be generated using captured image data, according to at least one embodiment; FIG. 2B illustrates views of similar objects in both a bird's eye view (BEV) or top-down image and an intermediate histogram image, according to at least one embodiment; FIG. 3 illustrates corresponding blocks of image data in a disparity image and an intermediate histogram image, according to at least one embodiment; FIG. 4 illustrates corresponding blocks of image data in an intermediate histogram image and a bird's eye view image, according to at least one embodiment; FIG. 5 illustrates an example process that can be performed to generate a bird's eye view image from disparity image data using an embedded processor, according to at least one embodiment; FIG. 6 illustrates an example system including an embedded processor with direct memory access (DMA) functionality, according to at least one embodiment; FIG. 7A illustrates a comparative view of the amount of detail captured for objects at different distances from a camera, according to at least one embodiment; FIG. 7B illustrates different size filters needed to process the same amount of detail information for objects at different distances in a bird's eye view image, according to at least one embodiment; FIG. 8 illustrates a comparison of filter sizes that can be used to process the same amount of detail information for objects at different distances in a bird's eye view image and an intermediate histogram image, according to at least one embodiment; FIG. 9 illustrates an example process that can be performed using a single filter size for objects at different distances to perform morphological filtering with respect to an intermediate histogram image, according to at least one embodiment; FIG. 10 illustrates components of a distributed system that can be utilized to generate, process, and provide sensor-based content, according to at least one embodiment; FIG. 11 illustrates an example computing environment in which one or more devices operate to process data using a SoC, according to at least one embodiment; FIG. 12 illustrates an example data center system, according to at least one embodiment; FIG. 13 illustrates a computer system, according to at least one embodiment; FIG. 14 illustrates a computer system, according to at least one embodiment; FIG. 15 illustrates at least portions of a graphics processor, according to one or more embodiments; FIG. 16 illustrates at least portions of a graphics processor, according to one or more embodiments; FIG. 17A illustrates an example of an autonomous vehicle, according to at least one embodiment; FIG. 17B illustrates an example of camera locations and fields of view for the autonomous vehicle of FIG. 17A, according to at least one embodiment; FIG. 17C is a block diagram illustrating an example system architecture for the autonomous vehicle of FIG. 17A, according to at least one embodiment; and FIG. 17D is a diagram illustrating a system for communication between cloud-based server(s) and the autonomous vehicle of FIG. 17A, according to at least one embodiment. DETAILED DESCRIPTION In the following description, various embodiments will be describ