Search

JP-2026076290-A - Forcibly terminating and resuming prefetching in the instruction cache.

JP2026076290AJP 2026076290 AJP2026076290 AJP 2026076290AJP-2026076290-A

Abstract

[Problem] To provide a device that efficiently controls the cache during instruction cache prefetching. [Solution] In a processor 100 including a hierarchical cache, the memory controller subsystem infers whether a virtual address in the first memory cache is a hit or miss, infers whether virtual addresses 200h to 380h are physical addresses 220 and 225, configures the status to be active in relation to the hit or miss status and the physical address, reconfigures the status to an inactive state in response to receiving a first indication from the CPU core that no program instructions related to the virtual address are needed, and reconfigures the status to an active state in response to receiving a second indication from the CPU core that a program instruction related to the virtual address is needed. [Selection Diagram] Figure 2

Inventors

  • ビピン プラサド ヘレマガルール ラマプラサッド
  • デヴィッド マシュー トンプソン
  • アブヒジート アショク チャチャド
  • ハング オング

Assignees

  • テキサス インスツルメンツ インコーポレイテッド

Dates

Publication Date
20260511
Application Date
20260206
Priority Date
20180814

Claims (20)

  1. It is a device, Central Processing Unit (CPU) core, A first memory cache for storing instructions to be executed by the CPU core, A second cache for storing instructions for execution by the CPU core, the second cache being accessible in response to a miss in the first memory cache, and a memory controller subsystem coupled to the CPU core and the first and second memory caches. Includes, The memory controller subsystem, Determine whether the first virtual address received from the CPU core is a miss or a hit in the first memory cache. A second virtual address is generated based on the first virtual address. Determine whether the second virtual address in the first memory cache is a miss or a hit, The second virtual address is converted to a physical address, The status bits related to the hit or miss determination between the physical address and the second virtual address are set to the enabled state. In response to receiving a count value of zero from the CPU core, the status bit is changed to the disabled state. In response to receiving a restart indication from the CPU core, the status bit is returned to the enabled state. A device configured in such a way.
  2. The apparatus according to claim 1, A device in which the memory controller subsystem is configured to retrieve program instructions from a second memory cache using the physical address translated from the second virtual address.
  3. The apparatus according to claim 1, A device in which the receipt of the count value occurs after the second virtual address has been converted to the physical address.
  4. The apparatus according to claim 1, An apparatus in which the receipt of the count value from the CPU core occurs before the resumption indication is received from the CPU core.
  5. The apparatus according to claim 1, The apparatus further comprises a register in which the physical address and the status bit are stored.
  6. The apparatus according to claim 5, A device in which the hit or miss indication of the second virtual address in the first memory cache is stored in the register along with the physical address and the status bit.
  7. The apparatus according to claim 1, A device in which the first memory cache is for storing program instructions rather than data.
  8. It is a device, Central Processing Unit (CPU) core, A first memory cache for storing instructions to be executed by the CPU core, A second cache for storing instructions for execution by the CPU core, wherein the second memory cache retrieves instructions in response to a miss in the first memory cache. A memory controller subsystem coupled to the CPU core and the first and second memory caches, Includes, The memory controller subsystem, The hit or miss status of the first virtual address in the first memory cache is determined by inference. The first virtual address is inferred and converted to a physical address, The status is set to the enabled state in relation to the hit or miss situation and the physical address. In response to receiving a first indication from the CPU core that no program instructions related to the first virtual address are required, the status is reset to disabled. In response to receiving a second indication from the CPU core that a program instruction related to the first virtual address is required, the status is reset to the enabled state. A device configured in such a way.
  9. The apparatus according to claim 8, A device in which the memory controller subsystem is configured to infer the first virtual address from a second virtual address transmitted from the CPU core to the memory controller subsystem.
  10. The apparatus according to claim 8, A device in which the first indication from the CPU core, which indicates that no program instructions related to the first virtual address are required, includes a count value, and the count value has a value of zero.
  11. The apparatus according to claim 8, A device in which the second indication from the CPU core that a program instruction associated with the first virtual address is required includes a signal instructing the memory controller subsystem to continue retrieving a program instruction beginning at the first virtual address.
  12. The apparatus according to claim 11, A device in which, upon receiving the second indication, the memory controller subsystem is configured to continue retrieving program instructions beginning at the first virtual address without having to re-determine the hit or miss status of the first virtual address in the first memory cache.
  13. The apparatus according to claim 12, A device configured such that, upon receiving the second indication, the memory controller subsystem continues to retrieve program instructions beginning with the first virtual address without converting the first virtual address back to the physical address.
  14. The apparatus according to claim 8, A device in which the CPU core is configured to provide the second indication without providing the first virtual address to the memory controller subsystem.
  15. The apparatus according to claim 8, A device in which the receipt of the first indication occurs after the inferential determination of the hit or miss situation and the inferential translation of the first virtual address to the physical address.
  16. It is a system-on-a-chip (SoC), Input/output devices, and processors coupled to the input/output devices, Includes, The aforementioned processor, The system includes a central processing unit (CPU) core, a first memory cache for storing instructions to be executed by the CPU core, a second memory cache, and a memory controller subsystem coupled to the CPU core and the first and second memory caches, wherein the memory controller subsystem The hit or miss status of the first virtual address in the first memory cache is determined by inference. The first virtual address is inferred and converted to a physical address, In relation to the hit or miss situation and the physical address, set the status to the enabled state. In response to receiving a first indication from the CPU core that no program instructions related to the first virtual address are required, the status is reset to disabled. In response to receiving a second indication from the CPU core that a program instruction related to the first virtual address is required, the status is reset to the enabled state. A System of Computers (SoC) configured in such a way.
  17. The SoC according to claim 16, A System of Computers (SoC) configured such that the memory controller subsystem infers the first virtual address from a second virtual address transmitted from the CPU core to the memory controller subsystem.
  18. The SoC according to claim 16, A SoC in which no program instructions related to the first virtual address are required, and the first indication from the CPU core includes a count value, the count value being zero.
  19. The SoC according to claim 16, A device in which the CPU core is configured to provide the second indication without providing the first virtual address to the memory controller subsystem.
  20. The SoC according to claim 16, An SoC configured such that, upon receiving the second indication, the memory controller subsystem continues to retrieve program instructions beginning with the first virtual address without again determining the hit or miss status of the first virtual address in the first memory cache, and without again translating the first virtual address back to the physical address.

Description

Some memory systems include multi-level cache systems. When a request for a specific memory address is received from the processor core by the memory controller, the memory controller determines whether the data associated with that memory address exists in the first-level cache (L1). If the data exists in the L1 cache, it is returned from the L1 cache. If the data associated with the memory address does not exist in the L1 cache, the memory controller accesses the second-level cache (L2). Because L2 is larger than the L1 cache, it can hold more data. If the data exists in the L2 cache, it is returned from the L2 cache to the processor core, and a copy is also stored in the L1 cache if the same data is requested again. Additional memory level hierarchies are also possible. In one example, the system includes a processor, which includes a CPU core, first and second memory caches, and a memory controller subsystem. The memory controller subsystem infers whether a virtual address in the first memory cache is a hit or miss and infers whether the virtual address is a physical address. In relation to the hit or miss status and the physical address, the memory controller subsystem configures the status to enabled. In response to receiving a first indication from the CPU core that no program instructions related to the virtual address are required, the memory controller subsystem reconfigures the status to disabled. In response to receiving a second indication from the CPU core that a program instruction related to the first virtual address is required, the memory controller subsystem reconfigures the status to enabled without additional access to the TAGRAM or address translation logic. A processor following one example is illustrated. The following diagram illustrates the promotion of an L1 memory cache access to a full L2 cache line access, following one example. This is a flowchart illustrating performance improvement according to one example. This is another flowchart illustrating a different performance improvement following the example. Figure 1 shows the system including the processor. Figure 1 shows an example of a processor 100 including a hierarchical cache subsystem. In this example, the processor 100 includes a central processing unit (CPU) core 102, a memory controller subsystem 101, an L1 data cache (L1D) 115, an L1 program cache (L1P) 130, and an L2 memory cache 155. In this example, the memory controller subsystem 101 includes a data memory controller (DMC) 110, a program memory controller (PMC) 120, and an integrated memory controller (UMC) 150. In this example, at the L1 cache level, data and program instructions are separated into different caches. Instructions executed by the CPU core 102 are stored in the L1P 130 and then provided to the CPU core 102 for execution. Meanwhile, data is stored in the L1D 115. The CPU core 102 can read data from and write data to L1D 115, and has read access to L1P 130 (but no write access to L1P 130). The L2 memory cache 155 can store both data and program instructions. The sizes of L1D 115, L1P 130, and L2 memory cache 155 may vary depending on the implementation, but in one example, the size of the L2 memory cache 155 is larger than either L1D 115 or L1P 130. For example, the size of L1D 115 is 32 kilobytes, and the size of L1P is also 32 kilobytes, while the size of the L2 memory cache can be 64 kilobytes to 4 MB. Furthermore, the cache line size of L1D 115 is the same as the cache line size of L2 memory cache 155 (e.g., 128), while the cache line size of L1P 130 is even smaller (e.g., 64 bytes). When data is required by the CPU core 102, the DMC 110 receives an access request for the target data from the CPU core 102. The access request may include an address (e.g., a virtual address) from the CPU core 102. The DMC 110 determines whether the target data exists in L1D 115. If the data exists in L1D 115, it is returned to the CPU core 102. However, if the data requested by the CPU core 102 does not exist in L1D 115, the DMC 110 provides an access request to the UMC 150. This access request may include a physical address generated by the DMC 110 based on the virtual address (VA) provided by the CPU core 102. The UMC 150 determines whether the physical address provided by the DMC 110 exists in the L2 memory cache 155. If the data exists in the L2 memory cache 155, the data is returned from the L2 memory cache 155 to the CPU core 102, and a copy is stored in L1D 115. There may be additional hierarchies within the cache subsystem. For example, the L3 memory cache or system memory may be available for access. Therefore, if the data requested by the CPU core 102 is not present in either the L1D 115 or the L2 memory cache 155, the data may be accessed at an additional cache level. Regarding program instructions, when the CPU core 102 requires additional instructions to be executed, it provides the VA 103 to the PMC 120. The PMC responds to the VA 103 provided by the C