Search

US-12619539-B2 - Technique for efficiently operating a data processing unit when in a data retention mode

US12619539B2US 12619539 B2US12619539 B2US 12619539B2US-12619539-B2

Abstract

A processing unit has processing circuitry for performing data processing operations on data accessible in memory, and cache storage to store a subset of the data accessible in the memory, for access by the processing circuitry. Memory control circuitry controls access to a given portion of the memory associated with the processing unit, the given portion of memory comprising at least a shareable memory region to store data that is also accessible by at least one external processing element. Snoop circuitry, responsive to a snoop request identifying a memory address, performs a coherency action in respect of data stored in the cache storage for the identified memory address. Power control circuitry enables the processing unit to be transitioned between a plurality of operating modes, including a data retention mode where the data in the cache storage is retained whilst the processing circuitry and the snoop circuitry are inactive. When transitioning the processing unit into the data retention mode, a given invalidation coherency action is performed in order to invalidate each cache entry storing data associated with a memory address in the shareable memory region. Whilst the processing unit is in the data retention mode, the memory control circuitry allows a write operation instigated by the at least one external processing element, that specifies a given memory address within the shareable memory region, to be performed without waking up the snoop circuitry to process a corresponding snoop request for the given memory address.

Inventors

  • Yue Zhao
  • Rong Zhang

Assignees

  • ARM LIMITED

Dates

Publication Date
20260505
Application Date
20241002

Claims (19)

  1. 1 . A processing unit comprising: processing circuitry configured to perform data processing operations on data accessible in memory; cache storage having a plurality of cache entries configured to store a subset of the data accessible in the memory, for access by the processing circuitry; memory control circuitry configured to control access to a given portion of the memory associated with the processing unit, the given portion of memory comprising at least a shareable memory region to store data that is also accessible by at least one external processing element external to the processing unit; snoop circuitry, responsive to a snoop request identifying a memory address, to perform a coherency action in respect of data stored in the cache storage for the identified memory address; and power control circuitry configured to enable the processing unit to be transitioned between a plurality of operating modes, including a data retention mode where the data in the cache storage is retained whilst the processing circuitry and the snoop circuitry are configured to be inactive; wherein: the power control circuitry is configured, when transitioning the processing unit into the data retention mode, to cause a given invalidation coherency action to be performed in order to invalidate each cache entry storing data associated with a memory address in the shareable memory region; and the memory control circuitry is configured, whilst the processing unit is in the data retention mode, to allow a write operation instigated by the at least one external processing element, that specifies a given memory address within the shareable memory region, to be performed without waking up the snoop circuitry to process a corresponding snoop request for the given memory address.
  2. 2 . A processing unit as claimed in claim 1 , wherein: the plurality of operating modes further comprises a normal mode where the processing circuitry and the snoop circuitry are configured to be active; and the memory control circuitry is configured, whilst the processing unit is in the normal mode, to be responsive to the write operation instigated by the at least one external processing element to allow the write operation to be performed but in addition to cause the corresponding snoop request to be issued to the snoop circuitry to cause the coherency action to be performed in the event that data for the given memory address is stored within the cache storage.
  3. 3 . A processing unit as claimed in claim 1 , wherein the given portion of memory further comprises a non-shareable memory region to store data inaccessible to the at least one external processing element.
  4. 4 . A processing unit as claimed in claim 1 , wherein the cache storage is configured to treat any cacheable data associated with a memory address in the shareable memory region as write through, and the given invalidation coherency action comprises marking as invalid each cache entry storing data associated with a memory address in the shareable memory region.
  5. 5 . A processing unit as claimed in claim 1 , further comprising: register storage associated with the cache storage, the register storage providing a plurality of register entries, wherein each register entry is arranged to store state information for an associated cache entry of the cache storage, and the state information is sufficient to enable identification of the cache entries that have stored therein data associated with a memory address in the shareable memory region; and the power control circuitry is arranged to cause the given invalidation coherency action to be performed by referencing the state information in the register storage without requiring a lookup in the cache storage to be performed.
  6. 6 . A processing unit as claimed in claim 5 , wherein performance of the given invalidation coherency action comprises identifying each register entry whose state information indicates that the associated cache entry has stored therein data associated with a memory address in the shareable memory region, and for each identified register entry updating the state information to indicate that the associated cache entry is in an invalid state.
  7. 7 . A processing unit as claimed in claim 1 , wherein the memory comprises a further portion that is accessible to the processing unit via interconnect circuitry, the further portion comprising a further shareable memory region to store data that is also accessible by the at least one external processing element, and a further non-shareable memory region to store data inaccessible to the at least one external processing element.
  8. 8 . A processing unit as claimed in claim 5 , wherein: each register entry is arranged to identify, for the associated cache entry, one of the following states: a first state indicating that the associated cache entry is invalid; a second state indicating that the associated cache entry stores data that is clean and in a shareable memory region of the memory; a third state indicating that the associated cache entry stores data that is clean and in a non-shareable memory region of the memory; a fourth state indicating that the associated cache entry stores data that is dirty and in the non-shareable memory region of the memory.
  9. 9 . A processing unit as claimed in claim 7 , wherein the processing unit is arranged to treat any data stored in the further shareable memory region as non-cacheable so that the cache storage is arranged, when caching data from the further portion of the memory, to only cache data from the further non-shareable memory region.
  10. 10 . A processing unit as claimed in claim 8 , wherein: the memory comprises a further portion that is accessible to the processing unit via interconnect circuitry, the further portion comprising a further shareable memory region to store data that is also accessible by the at least one external processing element, and a further non-shareable memory region to store data inaccessible to the at least one external processing element; the processing unit is arranged to treat any data stored in the further shareable memory region as non-cacheable so that the cache storage is arranged, when caching data from the further portion of the memory, to only cache data from the further non-shareable memory region; and the second state indicates that the associated cache entry stores data that is clean and in the shareable memory region of the given portion of the memory, since the cache storage is not configured to cache data from the further shareable memory region.
  11. 11 . A processing unit as claimed in claim 1 , wherein: each cache entry comprises a data entry for storing data and a TAG entry for storing address indicating information used to identify the memory address associated with that data; and the snoop circuitry is arranged when processing a snoop request identifying a given memory address, to reference at least a subset of the TAG entries to determine whether one of the cache entries stores data for the given memory address, and in that event to perform the coherency action.
  12. 12 . A processing unit as claimed in claim 11 , wherein the coherency action comprises invalidating said one of the cache entries when the data for the given memory address is being treated as write through.
  13. 13 . A processing unit as claimed in claim 1 , wherein for a given item of cacheable data in a non-shareable memory region of the memory, the cache storage is configured to identify whether that given item of cacheable data, when stored in the cache storage, is to be treated as write through data or write back data.
  14. 14 . A processing unit as claimed in claim 1 , wherein the at least one external processing element comprises at least an accelerator device configured to perform tasks on behalf of the processing unit.
  15. 15 . A processing unit as claimed in claim 1 , wherein: the memory comprises a further portion that is accessible to the processing unit via interconnect circuitry; and the given portion is configured to provide the processing unit with lower latency access to data than when accessing data in the further portion.
  16. 16 . A method of operating a processing unit, comprising: performing, using processing circuitry, data processing operations on data accessible in memory; storing, within a cache storage having a plurality of cache entries, a subset of the data accessible in the memory, for access by the processing circuitry; controlling, using memory control circuitry, access to a given portion of the memory associated with the processing unit, the given portion of memory comprising at least a shareable memory region to store data that is also accessible by at least one external processing element external to the processing unit; responsive to a snoop request identifying a memory address, performing using snoop circuitry a coherency action in respect of data stored in the cache storage for the identified memory address; transitioning the processing unit between a plurality of operating modes, including a data retention mode where the data in the cache storage is retained whilst the processing circuitry and the snoop circuitry are configured to be inactive; when transitioning the processing unit into the data retention mode, causing a given invalidation coherency action to be performed in order to invalidate each cache entry storing data associated with a memory address in the shareable memory region; and whilst the processing unit is in the data retention mode, allowing a write operation instigated by the at least one external processing element, that specifies a given memory address within the shareable memory region, to be performed without waking up the snoop circuitry to process a corresponding snoop request for the given memory address.
  17. 17 . A system comprising: the processing unit of claim 1 , implemented in at least one packaged chip; at least one system component; and a board, wherein the at least one packaged chip and the at least one system component are assembled on the board.
  18. 18 . A chip-containing product comprising the system of claim 17 , wherein the system is assembled on a further board with at least one other product component.
  19. 19 . A non-transitory, computer-readable storage medium storing computer-readable code for fabrication of the processing unit of claim 1 .

Description

BACKGROUND The present technique relates to the field of data processing, and more particularly to techniques for efficiently operating a data processing unit when in a data retention mode. In order to seek to improve energy efficiency/reduce power consumption, a modern data processing unit is often able to be transitioned between a number of different operating modes. For example, in addition to a normal mode of operation where the processing unit's circuits are active and able to perform data processing operations, one or more lower power modes of operation may be supported to seek to reduce power consumption. For instance, a data retention mode may be supported that is used to retain the data held in certain data structures of the processing unit, whilst various of the circuits within the processing unit are placed into an inactive state in order to reduce power consumption. This can be useful, for example, during periods of time where the processing unit does not have active tasks to perform, but where it is expected that when such tasks resume the processing unit will require access to the data held in the data structures. One such data structure in which it may be desired to retain data during the data retention mode is a data cache used to store a subset of the data accessible in memory, for access by processing circuitry of the processing unit. One or more regions of memory may be shareable memory regions used to store data that is accessible not only by the processing unit, but also by at least one processing element external to the processing unit (such data being referred to herein as shareable data). For such shareable data it is known to provide a coherency mechanism to seek to ensure that each entity that can access the shareable data will see a coherent view of that data. However, in situations where the processing unit may cache shareable data in its data cache, this can cause issues when seeking to maintain coherency of that data when the processing unit is in the data retention mode. In particular, when an external processing element seeks to update an item of shareable data, then in order to implement coherency it may be required to cause the processing unit to exit the data retention mode in order to allow a lookup to be performed within the data cache to determine whether the processing unit has a cached copy of the data in question. This can significantly impact the power consumption benefits that might otherwise be achieved by using the data retention mode. SUMMARY In accordance with a first example arrangement, there is provided a processing unit comprising: processing circuitry configured to perform data processing operations on data accessible in memory; cache storage having a plurality of cache entries configured to store a subset of the data accessible in the memory, for access by the processing circuitry; memory control circuitry configured to control access to a given portion of the memory associated with the processing unit, the given portion of memory comprising at least a shareable memory region to store data that is also accessible by at least one external processing element external to the processing unit; snoop circuitry, responsive to a snoop request identifying a memory address, to perform a coherency action in respect of data stored in the cache storage for the identified memory address; and power control circuitry configured to enable the processing unit to be transitioned between a plurality of operating modes, including a data retention mode where the data in the cache storage is retained whilst the processing circuitry and the snoop circuitry are configured to be inactive; wherein: the power control circuitry is configured, when transitioning the processing unit into the data retention mode, to cause a given invalidation coherency action to be performed in order to invalidate each cache entry storing data associated with a memory address in the shareable memory region; and the memory control circuitry is configured, whilst the processing unit is in the data retention mode, to allow a write operation instigated by the at least one external processing element, that specifies a given memory address within the shareable memory region, to be performed without waking up the snoop circuitry to process a corresponding snoop request for the given memory address. In accordance with another example arrangement, there is provided a method of operating a processing unit, comprising: performing, using processing circuitry, data processing operations on data accessible in memory; storing, within a cache storage having a plurality of cache entries, a subset of the data accessible in the memory, for access by the processing circuitry; controlling, using memory control circuitry, access to a given portion of the memory associated with the processing unit, the given portion of memory comprising at least a shareable memory region to store data that is also accessible by at least one e