Search

KR-102963769-B1 - Hints for Scheduling Graphic Ray Tracing Tasks

KR102963769B1KR 102963769 B1KR102963769 B1KR 102963769B1KR-102963769-B1

Abstract

Techniques relating to a graphics processor that supports ray tracing are disclosed. In particular, the shader circuitry may be configured to adjust the scheduling priority of SIMD groups of shader programs based on a hint that a single instruction multiple data (SIMD) group has an upcoming ray tracing command to the ray tracing accelerator circuitry and based on an indication of resource usage from the ray tracing accelerator circuitry. This can advantageously reduce cache thrashing, for example, when shaders allocate memory for ray tracing commands and can fill the shared cache faster than the ray tracing accelerator circuitry can process the rays.

Inventors

  • 라바니 란쿠히, 알리
  • 테레신, 로만
  • 율리아노, 루카 오.
  • 자야스왈, 쉬남
  • 싱, 로힛 쿠마르

Assignees

  • 애플 인크.

Dates

Publication Date
20260512
Application Date
20240819
Priority Date
20231130

Claims (20)

  1. As a device, Light cross accelerator circuit section; and It includes a shader circuitry coupled to the above-mentioned beam crossing accelerator circuitry, and the shader circuitry comprises: To assign a first scheduling priority level to a single-instruction multiple-data (SIMD) group comprising a set of multiple instructions; To detect a hint that the SIMD group has the upcoming ray crossing command for the ray crossing accelerator circuit before encountering the upcoming ray crossing command by the shader circuit; To receive a resource usage indication from the above-mentioned beam cross accelerator circuit; In response to the above hint, based on the resource usage indication, to adjust the scheduling priority level of the SIMD group to a second scheduling priority level—wherein the second scheduling priority level is a higher priority level than the first scheduling priority level, and the adjustment is based on the resource usage indication that does not meet the threshold usage level—; and A device configured to schedule instructions of the SIMD group for execution by the shader circuit according to the second scheduling priority level.
  2. A device according to claim 1, wherein the shader circuit is further configured to adjust the scheduling priority of the second SIMD group from the second scheduling priority level to the first scheduling priority level based on a second resource usage indication satisfying the threshold usage level in response to a second hint that the second SIMD group has an approaching ray crossing command.
  3. In paragraph 2, the shader circuit part is, To block scheduling of the SIMD group based on the resource usage indication satisfying the second scheduling priority level and the threshold usage level; and A device further configured to allow scheduling of the SIMD group in response to the resource usage indication that does not meet the above threshold usage level.
  4. In claim 1, the shader circuit part is, A device further configured to disable the SIMD group from at least a portion of the processing pipeline where the SIMD group was scheduled prior to the detection of the above hint.
  5. In paragraph 4, the above deactivation is a device that responds to the execution of a yield instruction.
  6. In paragraph 1, It additionally includes a cache circuit, and the cache circuit comprises, A first memory space accessible to the above-mentioned ray cross accelerator circuit section and the above-mentioned shader circuit section, and One or more different memory spaces To store data about; and It is configured to evict data to a higher-level cache or memory according to the eviction policy, and The above hint is a device included in the SIMD group prior to one or more instructions in the SIMD group that allocate space for ray data for the ray intersection command in the first memory space.
  7. A device according to claim 1, wherein the resource usage indication indicates the number of available entries in the command buffer within the beam crossing accelerator circuit.
  8. In paragraph 1, the device is configured to track resource usage of the light cross accelerator circuitry using a credit control circuitry, and the credit control circuitry is, To adjust the credit value in the first direction in response to detecting a hint that the above SIMD group will include one or more ray intersection commands; and A device configured to adjust the credit value in a second direction in response to a change in the status of the beam crossing command in the beam crossing accelerator circuit section.
  9. In paragraph 1, the device, wherein the hint is a compiler-inserted hint.
  10. In paragraph 1, the device is a computing device, and the computing device is, display; Central processing unit; and A device that additionally includes a network interface.
  11. A non-transient computer-readable medium storing instructions of a shader program executable by a computing device to perform operations, wherein the operations are, Detecting a hint within the shader program that a Single Instruction Multiple Data (SIMD) group containing a set of multiple instructions has an upcoming ray crossing command for the ray crossing accelerator circuit—the detection is performed prior to encountering the upcoming ray crossing command—; In response to the above hint, adjusting the scheduling priority level for the SIMD group from a first priority level to a second priority level based on the resource usage indication provided by the light cross accelerator circuit—the second priority level is a higher priority level than the first priority level, and the adjustment is based on the resource usage indication that does not meet the threshold usage level—; and A non-transient computer-readable medium comprising scheduling instructions of the SIMD group according to the above-mentioned adjusted scheduling priority level.
  12. In Clause 11, the above operations are, A non-transient computer-readable medium further comprising, in response to a second hint that a second SIMD group has an upcoming ray cross command, adjusting the scheduling priority of the second SIMD group from the second priority level to the first priority level based on a second resource usage indication that satisfies the threshold usage level.
  13. In Clause 11, the above operations are, A non-transient computer-readable medium further comprising disabling the SIMD group from at least a portion of the processing pipeline in which the SIMD group was scheduled prior to the detection of the above hint.
  14. In paragraph 11, the hint is a non-transient computer-readable medium included in the SIMD group prior to one or more instructions in the SIMD group that allocate space for ray data for the ray intersection command.
  15. As a method, A step of detecting, by a computing system, a hint within a shader program indicating that a single instruction multiple data (SIMD) group containing a set of multiple instructions has an upcoming ray crossing command for a ray crossing accelerator circuit—the detecting step is performed prior to encountering the upcoming ray crossing command—; In response to the above hint, the computing system adjusts the scheduling priority level for the SIMD group from a first priority level to a second priority level based on a resource usage indication provided by the light cross accelerator circuit—the second priority level is a higher priority level than the first priority level, and the adjusting step is based on the resource usage indication that does not meet a threshold usage level—; and A method comprising the step of scheduling instructions of the SIMD group according to the adjusted scheduling priority level by the computing system.
  16. In paragraph 15, A method further comprising the step of adjusting the scheduling priority of the second SIMD group from the second priority level to the first priority level based on a second resource usage indication satisfying the threshold usage level in response to a second hint that the second SIMD group has an upcoming ray cross command.
  17. In Paragraph 16, A step of blocking the scheduling of the SIMD group based on the resource usage indication satisfying the second priority level and the threshold usage level by the computing system; and A method further comprising the step of allowing scheduling of the SIMD group by the computing system in response to the resource usage indication that does not meet the threshold usage level.
  18. As a device, Light cross accelerator circuit section; and The above-mentioned light cross accelerator circuit includes a shader circuit coupled to the above-mentioned light cross accelerator circuit, and the shader circuit includes, To assign a first scheduling priority level to a Single Instruction Multiple Data (SIMD) group containing a set of multiple instructions; To detect a hint that the SIMD group has the upcoming ray crossing command for the ray crossing accelerator circuit before encountering the upcoming ray crossing command by the shader circuit; To receive a resource usage indication from the above-mentioned beam cross accelerator circuit; In response to the above hint, based on the resource usage indication, to adjust the scheduling priority level of the SIMD group to a second scheduling priority level—wherein the second scheduling priority level is a lower priority level than the first scheduling priority level, and the adjustment is based on the resource usage indication that satisfies a threshold usage level—; and A device configured to schedule instructions of the SIMD group for execution by the shader circuit according to the second scheduling priority level.
  19. In Clause 18, the shader circuit section above is, A device further configured to block scheduling of the SIMD group based on the resource usage indication satisfying the second scheduling priority level and the threshold usage level.
  20. In Clause 18, the shader circuit section above is, A device further configured to disable the SIMD group from at least a portion of the processing pipeline where the SIMD group was scheduled prior to the detection of the above hint.

Description

Hints for Scheduling Graphic Ray Tracing Tasks The present disclosure generally relates to computer graphics processors, and more specifically to ray tracing. In computer graphics, ray tracing is a rendering technique for generating images by tracing the paths of light and simulating the effects of it encountering virtual objects. Ray tracing can allow for a resolution of three-dimensional visibility between any two points within a scene, which is also the source of most of its computational cost. A typical ray tracer samples light paths penetrating the scene in the reverse direction of light propagation, starting from the camera rather than from light sources (this is sometimes referred to as "backward ray tracing"). Starting from the camera has the advantage of tracing only the rays visible to the camera. Such a system can model a rasterizer, where rays simply stop at a first surface and call a shader (similar to a fragment shader) to calculate the color. More generally, secondary effects—such as diffuse interreflection and transmission, the exchange of light between scene elements—are also modeled. Shaders evaluating surface reflection properties can call additional cross-queries to capture incident light from other surfaces (e.g., to generate new rays). This recursive process has many expressions, but is commonly referred to as path tracing. Different graphics processing units (GPUs) can utilize varying degrees of hardware acceleration for ray tracing tasks. For example, a ray intersection acceleration circuit can be configured to determine a subset of primitives (e.g., triangles) within the graphics scene to be intersection-tested by traversing an acceleration data structure through tests for intersections with hierarchically arranged boundary volumes (e.g., boxes). Ray/triangle intersection tests can also be hardware-accelerated. A shader program calling the hardware accelerator circuit can allocate memory space for ray data prior to the call, and the ray data can be stored in a cache accessible to the accelerator circuit. FIG. 1a is a diagram illustrating an overview of exemplary graphic processing operations according to some embodiments. FIG. 1b is a block diagram illustrating an exemplary graphic unit according to some embodiments. FIG. 2 is a block diagram illustrating an exemplary graphics processor circuit configured to schedule tasks based on resource usage indicated by ray tracing hints and ray cross accelerators according to some embodiments. FIG. 3 is a block diagram illustrating exemplary multi-stage scheduling in a GPU using hint-based allocation for one or more stages according to some embodiments. FIG. 4 is a flowchart illustrating an exemplary technique for scheduling graphic tasks based on ray tracing hints according to some embodiments. FIG. 5 is a block diagram illustrating an exemplary credit control circuitry configured to track resource usage of a beam cross accelerator circuitry according to some embodiments. FIG. 6 is a flowchart illustrating an exemplary method for inserting a ray intersection hint into a shader program according to some embodiments. FIG. 7 is a flowchart illustrating an exemplary method according to some embodiments. FIG. 8 is a block diagram illustrating an exemplary computing device according to some embodiments. FIG. 9 is a drawing illustrating exemplary applications of the disclosed systems and devices according to some embodiments. FIG. 10 is a block diagram illustrating an exemplary computer-readable medium for storing circuit design information according to some embodiments. A graphics processor may include shader cores configured to execute graphics programs. The processor may accelerate certain graphics tasks, such as ray tracing tasks, using specialized circuitry. For example, a ray intersect accelerator (RIA) may be configured to iterate through an accelerator data structure (ADS) for rays in a graphics scene and may include a box test circuitry configured to test whether rays intersect with bounding boxes represented by the ADS. In this scenario, the shader may include instructions to allocate memory space for rays before calling the ray intersecting accelerator (e.g., the intersect_ray command follows the store_ray_data command). Ray data can be stored in a cache, which may be shared with other types of data (e.g., ray data can be stored in shader core memory space, but the cache can also store data from other memory spaces, such as thread private spaces, thread group spaces, device spaces, etc.). In some implementations, this introduces the possibility that space may be allocated for substantially more rays than the ray intersecting accelerator can handle over a specific time interval. This may cause ray data to be cached but subsequently evicted to make room for other data before being consumed by the ray accelerator circuitry. Generally, caching data long before it is used can lead to cache inefficiencies such as thrashing. Accordingly, in the