CN-121982180-A - Global nerve drawing method and system based on programmable rasterization engine
Abstract
The invention discloses a global nerve drawing method and a system based on a programmable rasterization engine, which belong to the technical field of computer graphics and comprise the steps of analyzing a rasterization descriptor according to a rasterization instruction at the programmable rasterization engine, extracting vector micro-operations and control parameters, maintaining a task state machine according to the parameters and distributing control signals, selecting an execution inlet from a vector kernel table according to the parameters, instantiating the operations into parallel vector threads, tracking the data dependence of a vector register and the synchronous state of a direct memory access unit in the execution process of the vector threads, executing vector loading/storing operations according to the data dependence of the vector register and a shared memory on a chip, dynamically scheduling the vector micro-operations to an execution part to complete rasterization calculation, and outputting the result to a nerve rendering network to complete global nerve drawing. The invention can uniformly and efficiently support the multi-representation neural rendering load on the AI accelerator, obviously reduce the memory access expense and improve the calculation efficiency.
Inventors
- BAO HUJUN
- YAN XINKAI
- HUO YUCHI
Assignees
- 浙江大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260403
Claims (10)
- 1. A global nerve drawing method based on a programmable rasterization engine, comprising the following steps: At a programmable rasterization engine deployed in an AI accelerator vector core, resolving a rasterization descriptor according to a rasterization instruction issued by a control core so as to extract vector micro-operation and control parameters; Maintaining a task state machine according to control parameters, distributing control signals, selecting an execution inlet from a preset vector kernel table according to the control signals, and instantiating vector micro-operations into parallel vector threads; In the execution process of the vector thread, tracking the data dependence of the vector register and the synchronous state of the vector register and the direct memory access unit, executing vector loading/storing operation to carry data between the register and the on-chip shared memory according to the data dependence, and dynamically dispatching ready vector micro-operation to an execution component to complete unified rasterization calculation of multi-primitive representation; and outputting the rasterization calculation result to a neural rendering network to complete global neural drawing.
- 2. The global nerve drawing method based on a programmable rasterization engine of claim 1, wherein the rasterization instructions comprise a first rasterization acceleration instruction and a second rasterization acceleration instruction for triggering the execution of edge function calculation and the execution of pixel interpolation or synthesis, respectively.
- 3. The global nerve drawing method based on a programmable rasterization engine of claim 1, wherein the control parameters include a function code, a task type, a primitive number, a batch size, a tile parameter, and a source/destination address and a data step size in an on-chip shared memory, wherein the function code is used to select different edge function calculation modes, pixel interpolation modes, or synthesis modes.
- 4. The programmable rasterization engine-based global nerve drawing method of claim 1, wherein the vector kernel table is programmable configured to support rasterization calculations of a plurality of geometric primitives including at least mesh triangles and 3D gaussian splats.
- 5. A global nerve drawing method based on a programmable rasterization engine as claimed in claim 1 or 3, wherein during vector thread execution, vector loading and storing are performed according to a step size in control parameters to carry data in batches, and arithmetic logic operation is performed by single instruction multiple data vector operation.
- 6. The programmable rasterization engine based global nerve drawing method of claim 2, wherein the vector micro-operation sequences of executing the first rasterization acceleration instruction and the second rasterization acceleration instruction share a double buffer in an on-chip shared memory and cause the two types of operations to be performed overlapping between different tiles or batches.
- 7. The global nerve drawing method based on a programmable rasterization engine of claim 1, wherein the programmable rasterization engine is configured to determine its set of touch tiles based on a screen range of each primitive and to emit light rasterization processing tasks only down tiles in the set of touch tiles.
- 8. The global nerve drawing method based on a programmable rasterization engine according to claim 1, wherein the direct memory access unit is configured to distribute the common features of the primitives to the buffers of the on-chip shared memory of the corresponding tiles in a single read, multi-target write manner according to the mapping relationship of the tiles to the primitives, so that the common features are multiplexed among the tiles.
- 9. The global nerve drawing method based on programmable rasterization engine of claim 1, wherein said AI accelerator comprises at least: the control core is used for scheduling a rasterization pipeline, generating a rasterization instruction and issuing a corresponding rasterization descriptor; the vector core comprises the programmable rasterization engine and is used for executing a rasterization instruction issued by the control core; the on-chip shared memory is used for residing primitive data, a tile list, edge function coefficients and tile-level pixel intermediate results; And the direct memory access unit is used for carrying data between the off-chip memory and the on-chip shared memory and supporting the distribution of single primitive data to storage areas corresponding to the plurality of screen tiles.
- 10. The global nerve drawing system based on the programmable rasterization engine is used for realizing the global nerve drawing method based on the programmable rasterization engine according to any one of claims 1-9, and is characterized by comprising an instruction analysis and parameter extraction module, a kernel control and thread instantiation module, a synchronous tracking and operation scheduling module and a result output and nerve drawing module; the instruction analysis and parameter extraction module is used for analyzing a grating descriptor at a programmable grating engine arranged in an AI accelerator vector core according to a grating instruction issued by a control core so as to extract vector micro-operation and control parameters; the kernel control and thread instantiation module is used for maintaining a task state machine according to control parameters and distributing control signals, selecting an execution inlet from a preset vector kernel table according to the control signals, and instantiating vector micro-operations into parallel vector threads; The synchronous tracking and operation scheduling module is used for tracking the data dependence of the vector register and the synchronous state of the direct memory access unit in the execution process of the vector thread, executing vector loading/storing operation to carry data between the register and the on-chip shared memory according to the synchronous state, and dynamically scheduling ready vector micro-operation to an execution component to complete unified rasterization calculation of multi-primitive representation; the result output and nerve drawing module is used for outputting the rasterization calculation result to the nerve rendering network so as to complete global nerve drawing.
Description
Global nerve drawing method and system based on programmable rasterization engine Technical Field The invention belongs to the technical field of computer graphics, and particularly relates to a global nerve drawing method and system based on a programmable rasterization engine. Background In recent years, neural rendering has evolved from a traditional graphics pipeline with Mesh (Mesh) as the core, to a hybrid paradigm that fuses explicit geometry with a learning representation. Explicit neural representation represented by 3D gaussian splats (3D-GS) becomes an important route due to its training/rendering efficiency and cross-view consistency, but its runtime dataflow, operator type and traditional grid raster have significant differences, resulting in two types of loads that are difficult to co-exist efficiently under unified hardware and unified scheduling. From the stage overhead of a typical 3D-GS rendering pipeline, the total of rasterization and pixel level rendering/compositing amounts to about 90% of the total time, while the preprocessing stage is only small, and this distribution is relatively stable under different scenes and resolutions, indicating that the bottleneck has stage consistency. Further splitting shows that pixel rendering is more dependent on FP32 computational power, while the raster phase is more sensitive to memory bandwidth, both of which constitute a typical double bottleneck of computational power/bandwidth splitting. In order to support grid and Gaussian mixture rendering on a unified platform, the prior study proposes an AI accelerator-oriented generalized rendering/rasterizing three-segment pipeline, namely object processing, screen processing and pixel processing. The screen processing stage takes on generalized rasterization and screen space scheduling (such as binning, routing and reordering), which are mostly fixed functions in the conventional GPU, but often lack corresponding dedicated units on the AI accelerator, which need to be mapped in a programmable manner. Tile rendering (TBR) is widely adopted to attach on-chip scratchpad memory (SPM) of AI accelerator in parallel with in-core, where the rasterization and coloring of the same batch of primitives can be completed in a single AI core, the intermediate result resides SPM, reducing multiple host round trips, and simultaneously explicit the bi-directional mapping of "object → Tile (Tile)" and "Tile → primitive", facilitating data routing and rearrangement in screen space. Although tile rendering relieves access pressure, there are still multiple classes of overhead for the high overlap nature of 3D-GS and the edge function/interpolation requirements of the mesh: (1) Global ordering and repetition, namely, primitive-Tile pairs are subjected to extensive repetition and ordering in 3D-GS, and the sub-stages of ordering and repetition are actually measured as bottlenecks; (2) Gaussian-Tile excessive interaction, namely, the average number of tiles covered by each Gaussian is far larger than the number of the Gaussian contained in a single Tile, so that invalid interaction and memory access are increased; (3) The large Tile side effect is that the oversize of the Tile greatly increases the total number of 3D-GS primitives to be rendered, and the sequencing and deduplication cost is amplified. Therefore, the importance of multi-stage clipping/eliminating is emphasized in the prior art, and especially, the early eliminating can effectively reduce the invalid workload of the limited sub-stage of the memory, and improve the utilization rate of the whole resources. Meanwhile, global operations of 'creating Tile/repeating/key ordering' are disassembled to a earlier stage and completed at the Tile level, so that global mutual interference is relieved. AI accelerators, on the other hand, differ from the capability configuration of conventional GPUs. On an AI accelerator, a plurality of operations in generalized rasterization are only 'weakly supported' or 'not supported', and a plurality of general instructions are needed to be spliced, so that the expansion of an instruction sequence and the reduction of scheduling efficiency are brought. This directly affects the performance and energy efficiency of key steps such as edge function computation, interpolation, prefix scanning, Z-cull, etc. It is noted that NeRF peer rendering paths can bypass to some extent the fixed function raster units in the screen processing, but in unified engines that are hybrid representation oriented (grid + gaussian), screen space scheduling is still an indispensable core. This further emphasizes the necessity of providing programmable screen processing and raster capability on the AI accelerator. Disclosure of Invention In view of the foregoing, an object of the present invention is to provide a global neural rendering method and system based on a programmable rasterization engine, by mapping the programmable rasterization engine to a vector core