Search

US-12626447-B2 - Pruning ray tracing traversal operations based on local ray parameter value

US12626447B2US 12626447 B2US12626447 B2US 12626447B2US-12626447-B2

Abstract

Techniques are disclosed relating to graphics processors that support ray tracing. In disclosed embodiments, ray intersect circuitry is configured to access a traversal stack used for traversal of multiple levels of a bounding volume hierarchy (BVH) acceleration data structure (ADS) according to a depth-first search to retrieve: coordinates of a first bounding region for a child node and a local ray parameter value that indicates a point along a ray at which an intersection with a second bounding region for the child node's parent node was detected. The accelerator circuitry may compare the local ray parameter value with an end ray parameter value to determine whether to traverse to the child node as part of traversal of the BVH.

Inventors

  • Ali Rabbani Rankouhi
  • Luca O. Iuliano
  • David J. Bermingham
  • Christopher A. Burns

Assignees

  • APPLE INC.

Dates

Publication Date
20260512
Application Date
20231130

Claims (19)

  1. 1 . An apparatus, comprising: ray intersect accelerator circuitry configured to: access a traversal stack that stores data for traversal of multiple levels of a bounding volume hierarchy (BVH) acceleration data structure (ADS) according to a depth-first search to retrieve: coordinates of a first bounding region for a child node; and a first local ray parameter value that indicates a point along a ray at which an intersection with a second bounding region for the child node's parent node was detected; and compare the first local ray parameter value with a first end ray parameter value to determine whether to traverse to the child node as part of traversal of the ADS; wherein: the ADS includes transform nodes, including the child node; the apparatus is configured to transform ray coordinates in conjunction with traversal to transform nodes; and the ray intersect accelerator circuitry is configured: for the transform nodes, to perform comparison of a local ray parameter value with an end ray parameter value to determine whether to traverse to the transform nodes, including the comparison of the first local ray parameter value for the child node; and for non-transform nodes of the ADS, not to perform the comparison.
  2. 2 . The apparatus of claim 1 , wherein the ray intersect accelerator circuitry is further configured to: generate the local ray parameter value based on a determined entry intersection point between the ray and the second bounding region; and write the coordinates and the local ray parameter value to the traversal stack in response to detecting intersections for multiple child nodes of the parent node.
  3. 3 . The apparatus of claim 1 , wherein the ray intersect accelerator circuitry is configured not to traverse to the node corresponding to the child node in response to the local ray parameter value being greater than the end ray parameter value.
  4. 4 . The apparatus of claim 1 , wherein the apparatus is configured to, prior to accessing the traversal stack: perform one or more primitive tests based on traversal past a sibling node of the child node; and update the end ray parameter value based on the one or more primitive tests.
  5. 5 . The apparatus of claim 1 , wherein: the apparatus is configured, in response to determining not to traverse to the child node, not to transform ray coordinates for the child node.
  6. 6 . The apparatus of claim 1 , wherein: the ray intersect accelerator circuitry is configured to generate the local ray parameter value based on an intermediate value that is computed as part of a bounding region intersection test; and the local ray parameter value is encoded using a smaller number of bits than coordinates of the ray.
  7. 7 . The apparatus of claim 1 , further comprising: storage circuitry configured to store the coordinates of the ray and a current end ray parameter value for the ray.
  8. 8 . The apparatus of claim 1 , wherein the apparatus is a computing device that further comprises: a display; a central processing unit; and a network interface.
  9. 9 . A method, comprising: accessing, by a computing system, a traversal stack that stores data for traversal of multiple levels of a bounding volume hierarchy (BVH) acceleration data structure (ADS) according to a depth-first search to retrieve: coordinates of a first bounding region for a child node; and a first local ray parameter value that indicates a point along a ray at which an intersection with a second bounding region for an ancestor node of the child node was detected; comparing, by the computing system, the first local ray parameter value with a first end ray parameter value to determine whether to traverse to the child node as part of traversal of the ADS; and traversing, by the computing system, the ADS based on the determination to generate intersection results for the ray, including: for transform nodes of the ADS, comparing a local ray parameter value with an end ray parameter value to determine whether to traverse to the transform nodes, including the comparing the first local ray parameter value for the child node; and for non-transform nodes of the ADS, traversing to the transform nodes without comparing a local ray parameter value with an end ray parameter value.
  10. 10 . The method of claim 9 , further comprising: generating, by the computing system, the local ray parameter value based on a determined entry intersection point between the ray and the second bounding region; and writing, the computing system, the coordinates and the local ray parameter value to the traversal stack in response to detecting intersections for multiple child nodes of the ancestor node.
  11. 11 . The method of claim 9 , wherein the comparing determines that the local ray parameter value is greater than the end ray parameter value and the traversing does not traverse to the child node as part of traversal of the ADS.
  12. 12 . The method of claim 9 , wherein the comparing is performed in response to determining that the child node is an instance node.
  13. 13 . The method of claim 9 , wherein the local ray parameter value is encoded using a smaller number of bits than coordinates of the ray.
  14. 14 . The method of claim 9 , further comprising: storing, by the computing system in a ray data cache, the coordinates of the ray and a current end ray parameter value for the ray.
  15. 15 . A non-transitory computer-readable medium having instructions of a hardware description programming language stored thereon that, when processed by a computing system, program the computing system to generate a computer simulation model, wherein the model represents a hardware circuit that includes: ray intersect accelerator circuitry configured to: access a traversal stack that stores data for traversal of multiple levels of a bounding volume hierarchy (BVH) acceleration data structure (ADS) according to a depth-first search to retrieve: coordinates of a first bounding region for a child node; and a first local ray parameter value that indicates a point along a ray at which an intersection with a second bounding region for the child node's parent node was detected; and compare the first local ray parameter value with a first end ray parameter value to determine whether to traverse to the child node as part of traversal of the ADS; wherein: the ADS includes transform nodes, including the child node; the circuit is configured to transform ray coordinates in conjunction with traversal to transform nodes; and the ray intersect accelerator circuitry is configured: for the transform nodes, to perform comparison of a local ray parameter value with an end ray parameter value to determine whether to traverse to the transform nodes, including the comparison of the first local ray parameter value for the child node; and for non-transform nodes of the ADS, not to perform the comparison.
  16. 16 . The non-transitory computer-readable medium of claim 15 , wherein the ray intersect accelerator circuitry is further configured to: generate the local ray parameter value based on a determined entry intersection point between the ray and the second bounding region; and write the coordinates and the local ray parameter value to the traversal stack in response to detecting intersections for multiple child nodes of the parent node.
  17. 17 . The non-transitory computer-readable medium of claim 15 , wherein the ray intersect accelerator circuitry is configured not to traverse to the node corresponding to the child node in response to the local ray parameter value being greater than the end ray parameter value.
  18. 18 . The non-transitory computer-readable medium of claim 15 , wherein the circuit is configured to, prior to accessing the traversal stack: perform one or more primitive tests based on traversal past a sibling node of the child node; and update the end ray parameter value based on the one or more primitive tests.
  19. 19 . The non-transitory computer-readable medium of claim 15 , wherein: the circuit is configured to, in response to determining not to traverse to the child node, avoid transforming ray coordinates for the child node.

Description

The present application claims priority to U.S. Provisional App. No. 63/583,920, entitled “Pruning Ray Tracing Traversal Operations based on Local Ray Parameter Value,” filed Sep. 20, 2023, the disclosure of which is incorporated by reference herein in its entirety. BACKGROUND Technical Field This disclosure relates generally to graphics processors and more particularly to ray tracing. Description of Related Art In computer graphics, ray tracing is a rendering technique for generating an image by tracing the path of light as pixels in an image plane and simulating the effects of its encounters with virtual objects. Ray tracing may allow resolution of visibility in three dimensions between any two points in the scene, which is also the source of most of its computational expense. A typical ray tracer samples paths of light through the scene in the reverse direction of light propagation, starting from the camera and propagating into the scene, rather than from the light sources (this is sometimes referred to as “backward ray tracing”). Starting from the camera has the benefit of only tracing rays which are visible to the camera. This system can model a rasterizer, in which rays simply stop at the first surface and invoke a shader (analogous to a fragment shader) to compute a color. More commonly, secondary effects—in which the exchange of illumination between scene elements, such as diffuse inter-reflection and transmission—are also modeled. Shaders that evaluate surface reflective properties may invoke further intersection queries (e.g., generate new rays) to capture incoming illumination from other surfaces. This recursive process has many formulations, but is commonly referred to as path tracing. Graphics processors (GPUs) that implement ray tracing typically provide more realistic scenes and lighting effects, relative to traditional rasterization systems. Ray tracing is typically computationally expensive, however. Improvements to ray tracing techniques may improve realism in graphics scenes, improve performance (e.g., allow tracing of more rays per frame, tracing in more complex scenes, or both), reduce power consumption (which may be particularly important in battery-powered devices), etc. In typical ray tracing implementations, the graphics processor traverses a bounding volume hierarchy (BVH) acceleration data structure (ADS) to determine which primitives (e.g., triangles) in a scene are to be tested for intersection with the ray. This substantially reduces the number of ray-primitive intersection tests for a given render (e.g., relative to testing every ray against every primitive). Generally, reductions in the number of bounding region intersection tests or primitive intersection tests for a given ray, while still accurately traversing the ADS, may improve performance, reduce power consumption, or both. BRIEF DESCRIPTION OF DRAWINGS FIG. 1A is a diagram illustrating an overview of example graphics processing operations, according to some embodiments. FIG. 1B is a block diagram illustrating an example graphics unit, according to some embodiments. FIG. 2 is a diagram illustrating an example intersection points for a ray with bounding boxes and primitives, according to some embodiments. FIGS. 3A-3C are diagrams illustrating example changes in a local ray parameter value (tLocal) and a max ray parameter value (tMax) for a ray in the context of potential pruning of traversal operations, according to some embodiments. FIG. 4 is a block diagram illustrating example prune control circuitry, according to some embodiments. FIG. 5A is a diagram illustrating an example portion of the bounding volume hierarchy tree, according to some embodiments. FIG. 5B is a diagram illustrating example contents of a traversal stack during traversal of the tree of FIG. 5A, including tLocal values, according to some embodiments. FIG. 6 is a flow diagram illustrating example prune techniques, according to some embodiments. FIG. 7 is a flow diagram illustrating an example method, according to some embodiments. FIG. 8 is a block diagram illustrating an example computing device, according to some embodiments. FIG. 9 is a diagram illustrating example applications of disclosed systems and devices, according to some embodiments. FIG. 10 is a block diagram illustrating an example computer-readable medium that stores circuit design information, according to some embodiments. DETAILED DESCRIPTION In disclosed embodiments, a GPU may determine to prune portions of an ADS traversal (e.g., pruning one or more bounding region/ray intersect tests). During a typical depth-first traversal of a BVH ADS, when there are hits for multiple child nodes, the traversal proceeds down the tree to one of the nodes and the rest of the child nodes are deferred and pushed on to a traversal stack. During the traversal, the end value (e.g., tMax) of the ray parametric interval may be reduced, e.g., due to a detected triangle intersection before traversal that re