Search

CN-122002017-A - Processing intermediate representation data of image views generated using stereoscopic disparity data

CN122002017ACN 122002017 ACN122002017 ACN 122002017ACN-122002017-A

Abstract

The present disclosure relates to processing intermediate representation data of image views generated using stereo disparity data. The methods presented herein provide for generating an alternative view from disparity data captured for one or more objects in a scene. The process may be performed using an embedded processor with DMA memory access or other limited-capacity hardware. An intermediate representation may be generated, which is a 2D histogram view of the disparity data. The intermediate representation may be transformed into an alternative view image, such as a bird's eye view image, using an embedded processor. Morphological or similar filtering may be performed on one or more objects in the intermediate representation using filters of the same size, regardless of the distance of the objects from the camera plane used to capture the parallax data.

Inventors

  • B. Kisakanin
  • HONG QINGYU

Assignees

  • 辉达公司

Dates

Publication Date
20260508
Application Date
20251105
Priority Date
20241105

Claims (20)

  1. 1. A system, comprising: At least one embedded processor having direct memory access, DMA, functionality for: Generating a two-dimensional 2D histogram view of one or more objects in an environment based in part on disparity data of the one or more objects, the two-dimensional histogram view being a function of an angle and a distance of at least one camera used to generate the stereo disparity data; selecting a single shape and size filter to be used regardless of the respective distance of the respective object from the camera plane of the at least one camera; performing morphological filtering on the one or more objects in the 2D histogram view using the filter of a specified size, and After the morphological filtering, the 2D histogram view is transformed into a substitute view image of the one or more objects.
  2. 2. The system of claim 1, wherein the 2D histogram view is an intermediate representation, and wherein the alternate view image is a bird's eye view image of the one or more objects generated by transforming the intermediate representation.
  3. 3. The system of claim 1, wherein the at least one embedded processor has no access to the full set of disparity data stored in external memory for generating the alternate view image.
  4. 4. The system of claim 1, wherein the system is further for determining the parallax data using image data captured with the at least one camera.
  5. 5. The system of claim 1, wherein the at least one camera comprises at least one of a stereoscopic camera assembly, a pair of matched camera sensors, or a depth sensor.
  6. 6. The system of claim 1, wherein the alternate view image is generated in part by generating a list of object centroids and statistics using the 2D histogram view and transforming the list into a corresponding list in a coordinate system of the alternate view image.
  7. 7. The system of claim 6, wherein the object centroid is calculated by the at least one embedded processor using a connected component algorithm using locations in the 2D histogram view identified as being associated with the one or more objects.
  8. 8. The system of claim 1, wherein the at least one embedded processor is further to estimate motion of the one or more objects using the 2D histogram view without determining a distance of the one or more objects from a camera plane of the at least one camera.
  9. 9. The system of claim 8, wherein the motion is estimated using a light flow map and the 2D histogram view using information from a camera view used to generate the disparity data.
  10. 10. The system of claim 1, wherein the system comprises at least one of: A system for performing a simulation operation; a system for performing a simulation operation to test or verify an autonomous machine application; a system for performing digital twinning operations; a system for performing optical transmission simulation; a system for rendering a graphical output; a system for performing a deep learning operation; A system for performing a generative AI operation using the large language model LLM; A system for performing a generative AI operation using the visual language model VLM; a system for performing a generated AI operation using the multimodal language model MMLM; A system for deploying one or more language models using an operating system, OS, level virtualization container that communicates with the one or more language models using one or more application programming interface, APIs; a system implemented using edge devices; a system for generating or presenting virtual reality VR content; a system for generating or presenting augmented reality AR content; a system for generating or presenting mixed reality MR content; A system comprising one or more virtual machine VMs; a system implemented at least in part in a data center; a system for performing hardware testing using simulation; A system for synthetic data generation; collaborative content creation platform for 3D asset, or A system implemented at least in part using cloud computing resources.
  11. 11. At least one embedded processor having a direct memory access, DMA, function for generating a substitute view image by generating an intermediate histogram as a function of angle from parallax data of a scene, filtering one or more objects in the intermediate histogram using a single filter size independent of distance from a camera plane, and transforming the intermediate histogram into the substitute view image.
  12. 12. The at least one embedded processor of claim 11, wherein the at least one embedded processor is further configured to perform a connected component analysis on the intermediate histogram to identify pixel locations associated with the one or more objects.
  13. 13. The at least one embedded processor of claim 12, wherein the at least one embedded processor is further configured to generate a list of object centroids and statistics of the one or more objects using the intermediate histogram and transform the list into a corresponding list in a coordinate system of the bird's eye view image.
  14. 14. The at least one embedded processor of claim 11, wherein the at least one embedded processor has no access to a full set of image data stored in an external memory for generating the intermediate histogram or the bird's eye view image.
  15. 15. The at least one embedded processor of claim 11, wherein the filtering comprises erosion filtering and dilation filtering representations in the intermediate histogram of the one or more objects.
  16. 16. The at least one embedded processor of claim 11, wherein the at least one embedded processor is included in at least one of: A system for performing a simulation operation; a system for performing a simulation operation to test or verify an autonomous machine application; a system for performing digital twinning operations; a system for performing optical transmission simulation; a system for rendering a graphical output; a system for performing a deep learning operation; a system implemented using edge devices; a system for generating or presenting virtual reality VR content; a system for generating or presenting augmented reality AR content; a system for generating or presenting mixed reality MR content; A system comprising one or more virtual machine VMs; a system implemented at least in part in a data center; a system for performing hardware testing using simulation; A system for synthetic data generation; A system for performing a generative AI operation using the large language model LLM; A system for performing a generative AI operation using the visual language model VLM; a system for performing a generated AI operation using the multimodal language model MMLM; A system for deploying one or more language models using an operating system, OS, level virtualization container that communicates with the one or more language models using one or more application programming interface, APIs; collaborative content creation platform for 3D asset, or A system implemented at least in part using cloud computing resources.
  17. 17. A computer-implemented method, comprising: Generating, using an embedded processor with DMA memory access, a two-dimensional 2D histogram view of one or more objects in an environment based in part on parallax data of the one or more objects, the two-dimensional histogram view being a function of an angle of at least one camera used to generate the parallax data; selecting a single shape and size filter to be used regardless of the respective distance of the respective object from the camera plane of the at least one camera; performing morphological filtering on the one or more objects in the 2D histogram view using the filter of a specified size, and After the morphological filtering, the 2D histogram view is transformed into a substitute view image of the one or more objects using the embedded processor.
  18. 18. The computer-implemented method of claim 17, wherein the embedded processor has no access to external memory used in generating the 2D histogram view or the alternate view image.
  19. 19. The computer-implemented method of claim 17, further comprising: The size of the filter is selected based in part on a data transfer limit of the DMA memory access and a resolution of the disparity data.
  20. 20. The computer-implemented method of claim 17, further comprising: Performing, using the embedded processor, a connected component analysis on the intermediate histogram representation to identify the locations associated with the one or more objects; generating a list of object centroids and statistics of the one or more objects from the intermediate histogram representation using the embedded processor, and The list is transformed into a corresponding list in a coordinate system of the alternate view image using the embedded processor.

Description

Processing intermediate representation data of image views generated using stereoscopic disparity data Technical Field The present disclosure relates to the transformation of image data between different views or representations, and more particularly, to, in one or more non-limiting embodiments, generating an intermediate image representation from a set of disparity data that allows for processing and transformation using limited capacity resources. Background In various computing operations, it is desirable to determine the location of various objects in a scene or geographic area. This may include, for example and without limitation, analyzing the captured image information to support the tasks of navigation, positioning, controlled interaction, and collision avoidance of the robot and autonomous or semi-autonomous vehicle or machine. Performing operations such as those involving image recognition and computer vision requires a significant amount of resource capacity, including the ability to access memory having sufficient capacity to store the entire image. Tasks such as generating a Bird's Eye View (BEV) representation of a scene from captured parallax data are difficult, if not impossible, to perform using resources of limited capacity, such as an embedded processor that does not access external memory. Moreover, tasks such as morphological filtering and motion analysis are resource intensive when needed to perform on the bird's eye view image, as objects at different distances may have different levels of quality or amount of captured information. Drawings Various embodiments according to the present disclosure will be described with reference to the accompanying drawings, in which: FIGS. 1A, 1B, 1C, and 1D illustrate image views that may be generated from captured image data in accordance with at least one embodiment; FIG. 2A illustrates an intermediate image that may be generated using captured image data in accordance with at least one embodiment; FIG. 2B illustrates a view of a Bird's Eye View (BEV) or similar objects in top down images and intermediate histogram images in accordance with at least one embodiment; FIG. 3 illustrates corresponding image data blocks in a parallax image and an intermediate histogram image in accordance with at least one embodiment; FIG. 4 illustrates corresponding image data blocks in an intermediate histogram image and a bird's eye view image in accordance with at least one embodiment; FIG. 5 illustrates an example process that may be performed using an embedded processor to generate a bird's eye view image from parallax image data in accordance with at least one embodiment; FIG. 6 illustrates an example system including an embedded processor with Direct Memory Access (DMA) functionality in accordance with at least one embodiment; FIG. 7A illustrates a comparison of the amount of detail captured for an object at different distances from a camera in accordance with at least one embodiment; FIG. 7B illustrates different size filters required to process the same amount of detail information for objects at different distances in a bird's eye view image in accordance with at least one embodiment; FIG. 8 illustrates a comparison of filter sizes that may be used to process the same amount of detail information for objects at different distances in a bird's eye view image and an intermediate histogram image in accordance with at least one embodiment; FIG. 9 illustrates an example process that may be performed for objects at different distances using a single filter size to perform morphological filtering on a middle histogram image in accordance with at least one embodiment; FIG. 10 illustrates components of a distributed system that can be used to generate, process, and provide sensor-based content in accordance with at least one embodiment; FIG. 11 illustrates an example computing environment in which one or more devices operate to process data using a SoC in accordance with at least one embodiment; FIG. 12 illustrates an example data center system in accordance with at least one embodiment; FIG. 13 illustrates a computer system in accordance with at least one embodiment; FIG. 14 illustrates a computer system in accordance with at least one embodiment; FIG. 15 illustrates at least a portion of a graphics processor in accordance with one or more embodiments; FIG. 16 illustrates at least a portion of a graphics processor in accordance with one or more embodiments; FIG. 17A illustrates an example of an autonomous vehicle in accordance with at least one embodiment; FIG. 17B illustrates an example of camera position and field of view of the autonomous vehicle of FIG. 17A in accordance with at least one embodiment; FIG. 17C is a block diagram illustrating an example system architecture of the autonomous vehicle of FIG. 17A in accordance with at least one embodiment, and Fig. 17D is a schematic diagram of a system for communication between the cloud-based server of fig. 17A and