EP-3646281-B1 - FLEXIBLE BUFFER SIZING IN GRAPHICS PROCESSORS

EP3646281B1EP 3646281 B1EP3646281 B1EP 3646281B1EP-3646281-B1

Inventors

GOULD, JASON MATTHEW
NEVRAEV, IVAN

Dates

Publication Date: 20260506
Application Date: 20180601

Claims (13)

A method of handling data buffer resources in a graphics processor (120), the method comprising, by a buffer service (125) of the graphics processor: establishing a pool (362) of available memory pages tracked by memory pointers for use in a growable data structure (361); responsive to write requests by at least a shader unit (121) of the graphics processor for space in the growable data structure in which to write shader data, processing the data sizes to determine if the shader data can fit into current pages of the growable data structure, and providing to the shader unit at least write pointers (514, 524) to locations within memory pages from the growable data structure in accordance with data sizes indicated in the write requests; based on the shader data not able to fit into the current pages, providing to the shader unit first pointers indicating start locations in the growable data structure to begin writing the shader data, count information indicating quantities of the shader data able to be written in the current pages, and second pointers indicating at least one further page in the growable data structure into which the shader data can be spanned from the current pages; and receiving write completion messaging from the shader unit indicating quantities of data written into the at least one further page, and responsively updating corresponding pointers indicating availability of the data within the growable data structure for future read requests; wherein the buffer service is configured to track the amount of data written into the growable data structure and, as well as growing the growable data structure when the shader data is not able to fit into the current pages, also to automatically grow the growable data structure when a threshold fullness of the growable data structure is reached, the method comprising responsive to reaching the threshold fullness of the growable data structure, pre-fetching an address of at least one further memory page from the pool of available memory pages for inclusion in the growable data structure.
The method of claim 1, further comprising: responsive to indications by the shader unit of consumption of the shader data in ones of the memory pages, returning the ones of the memory pages into the pool of available memory pages.
The method of claim 1, further comprising: linking ones of the memory pages to each other in the growable data structure using page pointers included in a reserved portion of each of ones of the memory pages, wherein the page pointers each indicate at least a next page in the growable data structure; and wherein the memory pointers in the pool of available memory pages are tracked using at least a ring buffer configured to store pointers to the available memory pages not yet included in one or more growable data structures associated with the pool.
The method of claim 1, further comprising: maintaining a write pointer into the growable data structure that indicates a location within the growable data structure to begin write operations for the shader data; and maintaining a read pointer into the growable data structure that indicates a location within the growable data structure to begin read operations for the shader data.
The method of claim 1, further comprising: receiving combined requests from the shader unit that group requests for an associated plurality of shader threads of at least a wave of threads or threadgroup of the shader unit; for write space requests among the combined requests, determining a quantity of requested space for writing into the growable data structure by the associated plurality of shader threads, and based at least on the quantity of requested space, providing to the shader unit a write pointer indicating a start location in the growable data structure to begin writing corresponding shader data for the associated plurality of shader threads, write count information indicating quantities of the shader data able to be written in an initial write page, and an additional write pointer indicating at least one further write page in the growable data structure into which the corresponding shader data can be spanned from the initial write page; and for read requests among the combined requests, determining a quantity of requested read data to be retrieved from the growable data structure by the associated plurality of shader threads, providing to the shader unit a read pointer indicating a read start location for an initial read page in the growable data structure to begin retrieving the read data by the associated plurality of shader threads, read count information indicating quantities of valid read data in the initial read page, and an additional read pointer indicating at least one further read page in the growable data structure for which the read data has been spanned from the initial page.
The method of claim 1, further comprising: monitoring status of the shader data written into the growable data structure; based at least on a fullness status of the growable data structure, prioritizing portions of the shader data to be read from the growable data structure; and indicating to the shader unit one or more prioritized portions of the shader data to responsively prompt corresponding read operations to be issued by the shader unit for the one or more prioritized portions.
A data buffer management system, comprising: a pool manager configured to manage a pool of available memory pages tracked by memory pointers for use in one or more buffer data structures that are growable; and a buffer manager configured to receive write space requests issued by shader units of a graphics processor, to process the data sizes indicated by the write space requests to determine if associated shader data can fit into current pages of the buffer data structures, and responsively provide to the shader units at least write pointers to locations within memory pages of the buffer data structures in accordance with data sizes indicated in the write space requests; wherein, based on the shader data sizes exceeding available space in the current pages, the buffer manager is configured to provide to the shader units first pointers indicating start locations in the buffer data structures to begin writing the associated shader data, count information indicating quantities of the associated shader data able to be written in the current pages, and second pointers indicating at least one further page in the buffer data structures into which the associated shader data can be spanned from the current pages; the buffer manager is further configured to receive write completion messaging from the shader units indicating quantities of data written into the at least one further page, and responsively update corresponding pointers indicating availability of the data within the buffer data structures for future read requests; and wherein the buffer manager is configured to track the amount of data written into the buffer data structures and, as well as growing the buffer data structures when the shader data is not able to fit into the current pages, also to automatically grow the buffer data structures when a threshold fullness of the buffer data structures is reached, and, when reaching the threshold fullness of the buffer data structures, pre-fetching an address of at least one further memory page from the pool of available memory pages for inclusion in the buffer data structures.
The data buffer management system of claim 7, comprising: responsive to indications by the shader unit of consumption of the shader data in ones of the memory pages, the pool manager configured to return the ones of the memory pages into the pool of available memory pages.
The data buffer management system of claim 7, comprising: the buffer manager configured to link ones of the memory pages to each other in the buffer data structures using page pointers included in a reserved portion of each of the ones of the memory pages, wherein the page pointers each indicate at least a next page in the buffer data structures; and wherein the memory pointers in the pool of available memory pages are tracked using at least a ring buffer configured to store pointers to the available memory pages not yet included in the buffer data structures.
The data buffer management system of claim 7, comprising: the buffer manager configured to maintain write pointers into the buffer data structures that indicates locations within the buffer data structures to begin write operations for shader data; and the buffer manager configured to maintain read pointers into the buffer data structures that indicates locations within the buffer data structures to begin read operations for shader data.
The data buffer management system of claim 7, comprising: the buffer manager configured to receive combined requests from the shader units that group requests for an associated plurality of shader threads of at least a wave of threads or threadgroup of the shader unit; for each write space request among the combined requests, the buffer manager configured to determine a quantity of requested space for writing into the buffer data structures by the associated plurality of shader threads, and based at least on the quantity of requested space, the buffer manager configured to provide to the shader unit at least a write pointer indicating a start location in the buffer data structures to begin writing corresponding shader data for the associated plurality of shader threads, write count information indicating quantities of the shader data able to be written in an initial write page, and at least an additional write pointer indicating at least one further write page in the buffer data structures into which the corresponding shader data can be spanned from the initial write page; and for each read request among the combined requests, the buffer manager configured to determine a quantity of requested read data to be retrieved from the buffer data structures by the associated plurality of shader threads, providing to the shader unit a read pointer indicating a read start location for an initial read page in the buffer data structures to begin retrieving the read data by the associated plurality of shader threads, read count information indicating quantities of valid read data in the initial read page, and an additional read pointer indicating at least one further read page in the buffer data structures for which the read data has been spanned from the initial page.
The data buffer management system of claim 7, further comprising: a buffer summarizer element configured to: monitor status of shader data written into the buffer data structures; based at least on a fullness status of each of the growable data structures, prioritize which of the buffer data structures should be read; and indicate at least to a launcher associated with the shader unit one or more prioritized portions of the shader data to responsively prompt corresponding read operations to be issued by one or more of the shader units for the one or more prioritized portions.
A graphics processing apparatus, comprising: the data buffer management system of any of claims 7 to 12; and the shader units, being configured to process shader data; the data buffer management system being configured to receive data handling requests from the shader units, including said write requests.

Description

BACKGROUND Computing systems, such as personal computers, portable computing platforms, gaming systems, and servers, can include graphics processors along with main/central processors. These graphics processors, sometimes referred to as graphics processing units (GPUs), can be integrated into the central processors or discretely provided on separate add-in cards, among other configurations. User applications, operating systems, video games, or other software elements can interface with GPUs using various application programming interfaces (APIs) that allow for standardized software/logical interfaces between the software elements and various GPU hardware elements. Most GPUs can have specialized roles for rendering both two-dimensional (2D) and three-dimensional (3D) graphics data for display, such as graphics data from operating systems, productivity applications, entertainment media, scientific analysis, gaming software, or other graphics data sources. GPUs can also be employed in general purpose processing environments, such as artificial intelligence, machine learning, neural nets, statistical analysis, and cryptocurrency mining. Within the GPUs, various internal stages can process graphics data into rendered images for display on a suitable display device. In many GPUs, these internal stages comprise a graphics pipeline that can take representations of scenes or user interfaces and render these into images for output to various display devices. Among these GPU stages are shader stages and other stages and functions that provide graphical details, surface texture mapping, colors, shadows, or other elements for portions of rendered images. US 7533237 B1 discloses systems and methods for dynamically allocating memory for thread processing that may reduce memory requirements while maintaining thread processing parallelism. A memory pool is allocated to stored data for processing multiple threads that does not need to be large enough to dedicate a fixed size portion of the memory pool to each thread that may be processed in parallel. US 2013/305009 A1 discloses a technique for dynamically allocating memory during multi-threaded program execution for a coprocessor that does not support dynamic memory allocation, memory paging, or memory swapping. US 7102646 B1 discloses a memory system and methods of operating the same. In a graphics system using a tiled architecture, instead of pre-allocating a fixed amount of memory for each tile, the invention dynamically allocates varying amounts of memory per tile depending on the demand. In one embodiment all or a portion of the available memory is divided into smaller pages that are preferably equal in size. US 9569348 B1 discloses a technique for performing a method for compressing page table entries (PTEs) prior to storing the PTEs in a translation look-aside buffer (TLB). A page table entry (PTE) request is received for a PTE that is not stored in the TLB. The PTE as well as a plurality of PTEs that are adjacent to the PTE are retrieved from a memory. The PTE and the plurality of PTEs are compressed and then stored in the TLB. OVERVIEW Presented herein is a method of handling a growable data structure in a graphics processor. Implementations may provide multiple growable data queue and stack structures for use in highly parallelized processing environments. According to one aspect of the present disclosure, a method of handling data buffer resources in a graphics processor is performed by a buffer service of the graphics processor. The method includes establishing a pool of available memory pages tracked by memory pointers for use in a growable data structure. Responsive to requests by at least a shader unit of the graphics processor for space in the growable data structure in which to write shader data, the method includes processing the data sizes to determine if the shader data can fit into current pages of the growable data structure, and providing to the shader unit at least write pointers to locations within memory pages from the growable data structure in accordance with data sizes indicated in the write requests. Based on the shader data not able to fit into the current pages, the method includes providing to the shader unit first pointers indicating start locations in the growable data structure to begin writing the shader data, count information indicating quantities of the shader data able to be written in the current pages, and second pointers indicating at least one further page in the growable data structure into which the shader data can be spanned from the current pages. The method further includes receiving write completion messaging from the shader unit indicating quantities of data written into the at least one further page, and responsively updating corresponding pointers indicating availability of the data within the growable data structure for future read requests. The buffer service is configured to track the amount of data written into the gr