EP-4363986-B1 - A DATA PROCESSING APPARATUS AND METHOD FOR HANDLING STALLED DATA
Inventors
- EYOLE, MBOU
- GABRIELLI, GIACOMO
- VENU, BALAJI
Dates
- Publication Date
- 20260506
- Application Date
- 20220621
Claims (15)
- A data processing apparatus (20, 100) comprising a plurality of processing elements (22, 102) connected via a network (108) arranged on a single chip to form a spatial architecture, each processing element of the plurality of processing elements comprising: processing circuitry (26, 114) to perform processing operations; memory control circuitry (24, 104) to perform data transfer operations associated with the processing element and to issue data transfer requests for requested data to the network, wherein the memory control circuitry is configured to monitor the network for the requested data and, in response to detecting the requested data, to retrieve the requested data from the network; local storage circuitry (110) to store data associated with the processing operations, the local storage circuitry comprising a plurality of local storage sectors (34, 112); and auxiliary memory control circuitry (28, 106) configured to monitor the network to detect stalled data, required by the processing element, that is stuck behind data that has been requested but that is not yet required and, in response to detecting the stalled data, to transfer the stalled data from the network to an auxiliary storage buffer dynamically selected from amongst the plurality of local storage sectors.
- The data processing apparatus of claim 1, wherein: the processing operations and the data transfer operations are triggered operations; each triggered operation is performed in response to trigger data meeting a corresponding processing trigger condition, and the processing element is configured to set further trigger data in response to completion of the triggered operation.
- The data processing apparatus of claim 1 or claim 2, wherein: each data transfer request specifies a data request tag; and the auxiliary memory control circuitry comprises auxiliary table storage to store an auxiliary memory table (30) associating a corresponding data request tag of the stalled data and a corresponding location in the auxiliary storage buffer.
- The data processing apparatus of claim 3, wherein the auxiliary memory control circuitry is configured to perform, in response to an indication that the memory control circuitry requires the stalled data, a lookup in the auxiliary memory table based on the data request tag specified by the data transfer request.
- The data processing apparatus of claim 4, wherein: the auxiliary memory control circuitry is configured to, when the lookup hits in the auxiliary memory table, provide a location associated with the data request tag to the memory control circuitry; and the memory control circuitry is configured to retrieve the stalled data from the location.
- The data processing apparatus of any preceding claim, wherein auxiliary memory control circuitry is configured to select the auxiliary storage buffer dynamically from amongst the plurality of local storage sectors based on a usage metric of each of the local storage sectors by the memory control circuitry.
- The data processing apparatus of claim 6, wherein: each processing element further comprises a plurality of counters (32) defining the usage metric, each counter indicative of a number of times an associated local storage sector has been accessed by the memory control circuitry; and the auxiliary memory control circuitry is configured to select, as the auxiliary storage buffer, a particular local storage sector corresponding to a counter of the plurality of counters indicating fewest accesses.
- The data processing apparatus of claim 7, wherein the local storage is configured to allow the memory control circuitry and the auxiliary memory control circuitry to access different sectors in parallel.
- The data processing apparatus of claim 8, wherein the local storage circuitry is configured to, in response to parallel accesses to a same sector by the memory control circuitry and the auxiliary memory control circuitry, prioritise the auxiliary memory control circuitry.
- The data processing apparatus of any preceding claim, wherein: the memory control circuitry is configured to, when retrieving the data from the network, modify a dequeue signal to indicate that the data has been removed from the network; and the auxiliary memory control circuitry is configured to, when monitoring the network to detect the stalled data: periodically monitor the dequeue signal; and determine that the queued data comprises stalled data when the dequeue signal indicates that the data remains on the network.
- The data processing apparatus of any preceding claim, wherein: each processing element further comprises a plurality of interface channels (38, 42) to store queued data to be transferred between the processing element and the network; and the memory control circuitry is configured to monitor the network for the requested data by monitoring the plurality of interface channels.
- The data processing apparatus of claim 11, wherein the auxiliary memory control circuitry is configured to, when monitoring the network to detect stalled data: capture first queue data indicative of the queued data in the plurality of interface channels at the start of a predetermined time period; capture second queue data indicative of the queued data in the plurality of interface channels at the end of the predetermined time period; and determine that the queued data comprises stalled data when the first queue data is the same as the second queue data.
- The data processing apparatus of any preceding claim, wherein the data transfer request is one of: an inter-processing element data transfer request specifying data to be transferred from another processing element; and a memory request specifying data to be transferred from a memory location in main memory.
- A non-transitory computer-readable medium storing computer-readable code to fabricate the data processing apparatus of claim 1.
- A method of operating a data processing apparatus (20, 100) comprising a plurality of processing elements (22, 102) connected via a network (108) arranged on a single chip to form a spatial architecture, each processing element comprising processing circuitry, memory control circuitry, local storage circuitry (110) comprising a plurality of local storage sectors (34, 112) and auxiliary memory control circuitry (28, 106), the method comprising: performing processing operations using the processing circuitry; storing data associated with the processing operations in the local storage circuitry; with the memory control circuitry, performing data transfer operations associated with the processing element, issuing data transfer requests for requested data to the network, monitoring the network for the requested data and, in response to detecting the requested data in one of the plurality of interface channels, retrieving the requested data from the network; and with the auxiliary memory control circuitry monitoring the network to detect stalled data, required by the processing element, that is stuck behind data that has been requested but that is not yet required and, in response to detecting the stalled data, transferring the stalled data from the network to an auxiliary storage buffer dynamically selected from amongst the plurality of local storage sectors.
Description
Some data processing apparatuses are provided with a plurality of processing elements arranged to form a spatial architecture with each of the processing elements connected via a network. The processing elements perform processing operations and data transfer operations including issuing data transfer requests to request data from the network. Typically, the processing elements of the spatial architecture are arranged to monitor data on the network to identify the requested data and to retrieve it from the network when it is detected. However, if requested data items for a processing element are returned in an order that is different from the order that is expected by the data processing element then the returned data can stall with the required data stuck behind other requested data. This stalled data can cause the spatial architecture to stall. EP 3 719 654 A1 discloses systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA). US 6 192 465 B1 discloses a microprocessor capable of out-of-order instruction decoding and in-order dependency checking. US2018/052690 discloses various embodiments of a microprocessor include a scoreboard implementation that directs the microprocessor to the location of data values. The invention is set out in the appended claims. The present techniques will be described further, by way of example only, with reference to embodiments thereof as illustrated in the accompanying drawings, in which: Figure 1 schematically illustrates a data processing apparatus arranged as a spatial architecture according to various examples of the present techniques;Figure 2 schematically illustrates an alternative method for detecting stalls in a data processing apparatus according to various examples of the present techniques;Figure 3 schematically illustrates a frequency of data requests typically observed in a data processing apparatus according to various examples of the present techniques;Figure 4 schematically illustrates details of a processing element according to various examples of the present techniques;Figure 5 schematically illustrates details of a processing element according to various examples of the present techniques;Figure 6 schematically illustrates a sequence of steps taken by memory control circuitry according to various examples of the present techniques;Figure 7 schematically illustrates a sequence of steps taken by auxiliary memory control circuitry according to various examples of the present techniques;Figure 8 schematically illustrates a sequence of steps taken by auxiliary memory control circuitry in order to detect stalled data according to various examples of the present techniques;Figure 9 schematically illustrates a sequence of steps taken by auxiliary memory control circuitry in response to a data transfer request according to various examples of the present techniques; andFigure 10 schematically illustrates a sequence of steps taken by local storage circuitry according to various examples of the present techniques. In some example configurations there is provided a data processing apparatus comprising a plurality of processing elements connected via a network arranged on a single chip to form a spatial architecture. Each processing element of the plurality of processing elements comprises processing circuitry to perform processing operations and memory control circuitry to perform data transfer operations associated with the processing element. The memory control circuitry is arranged to issue data transfer requests for requested data to the network and is configured to monitor the network for the requested data. The memory control circuitry is configured to retrieve the requested data from the network in response to detecting the requested data. Each processing element of the data processing apparatus is also provided with local storage circuitry to store data associated with the processing operations, the local storage circuitry comprising a plurality of local storage sectors, and auxiliary memory control circuitry configured to monitor the network to detect stalled data associated with the processing element. The auxiliary memory control circuitry is configured to transfer the stalled data from the network to an auxiliary storage buffer in response to detecting the stalled data where the auxiliary storage buffer is dynamically selected from amongst the plurality of local storage sectors. Spatial architectures are an arrangement of data processing elements that are distributed in space allowing a number of computations to be executed in parallel rather than sequentially. Typical spatial architectures enable different instructions to be applied in parallel to different data or the same instruction to be applied to different data during a same instruction cycle. The processing elements of the spatial architecture comprise memory control circuitry and processing circuitry which can be provided