Search

DE-112024002978-T5 - TRACK CACHE TECHNIQUES BASED ON PREBOUND CONTROL TRANSMISSION INSTRUCTIONS

DE112024002978T5DE 112024002978 T5DE112024002978 T5DE 112024002978T5DE-112024002978-T5

Abstract

The disclosure relates to trace-cache switching logic configured to identify and cache traces that meet certain criteria. The predictive switching logic can track the directions of executed control transfer instructions, including a first category of control transfer instructions that reach a first bias threshold in a particular direction (which can be described as "stable"), and a second category of control transfer instructions that do not reach the first bias threshold (which can be described as "unstable"). The trace-cache switching logic can identify traces of instructions that meet a set of criteria, including: only first-category control transfer instructions are allowed as internal control transfer instructions, and a second-category control transfer instruction is allowed only at the end of a particular trace. The disclosure of the techniques can offer advantageous performance and energy benefits of cached traces with reduced complexity compared to certain conventional trace caches.

Inventors

  • Niket K. Choudhary
  • Muawya M. Al-Otoom
  • Pruthivi Vuyyuru
  • Andrew H. Lin
  • Ilhyun Kim
  • Douglas C. Holman
  • Samir Dutt
  • Ronald P. Hall

Assignees

  • APPLE INC.

Dates

Publication Date
20260513
Application Date
20240701
Priority Date
20230714

Claims (20)

  1. The setup comprises: a processor switching logic configured to execute control transfer instructions; a predictive switching logic configured to track the directions of executed control transfer instructions, including: a first category of control transfer instructions that reach a first bias threshold in a given direction; and a second category of control transfer instructions that do not reach the first bias threshold; the track cache switching logic configured to: identify tracks of instructions that meet a set of criteria, including: a given track that includes at least one internal control transfer instruction; a conditional control transfer instruction that is permitted as an internal track instruction only if it meets the bias threshold; and a control transfer instruction in the second category that is permitted only at the end of a given track; and the storage of one or more identified tracks.
  2. establishment according Claim 1 , where the first category only includes control transmission instructions that have consistently followed the specified direction during execution over a certain period of time.
  3. establishment according Claim 1 , where the first category includes control transmission instructions that have taken at least a threshold proportion of the specified direction over a period of time.
  4. establishment according Claim 1 , further comprising: a filter control switching logic configured to process a third category of control transfer instructions corresponding to a subset of the first category of control transfer instructions that do not reach a second bias threshold, including: for a given third category internal control transfer instruction of the stored track, adjusting a counter in a first direction in response to the given control transfer instruction being executed in the direction of the next section of the stored track, and adjusting the counter in a second direction in response to the given control transfer instruction being executed away from the next section of the stored track; and preventing the inclusion of the given control transfer instruction as an internal instruction of tracks in the track cache in response to the counter reaching a threshold in the second direction.
  5. establishment according Claim 1 , where the prediction switching logic includes a preset predictor configured to generate the first category of predictions based on a preset table.
  6. establishment according Claim 1 , furthermore, comprising: a stub queue switching logic configured to generate tracks for temporary storage in the track cache switching logic from multiple retrieval groups retrieved from an instruction cache.
  7. establishment according Claim 1 , further comprising: a training switching logic for the track-cache switching logic, wherein the training switching logic is configured to: buffer tracks until they reach a utility threshold; and validate entries in the track-cache switching logic for tracks that satisfy one or more criteria and the utility threshold.
  8. establishment according Claim 7 , wherein the training switching logic is further configured to: implement a sliding window buffering in which multiple partially overlapping tracks are buffered; and selects one of the partially overlapping tracks for assignment in the track cache switching logic based on utility values of the partially overlapping tracks.
  9. establishment according Claim 8 , wherein the training switching logic is configured to buffer call groups retrieved from an instruction cache, and is configured not to buffer call groups retrieved from the track cache switching logic.
  10. establishment according Claim 1 , wherein the trace cache switching logic is configured to identify traces of instructions that satisfy the criteria chain after a decoding stage of the processor switching logic.
  11. establishment according Claim 1 , further comprising: a track-cache control switching logic configured to: remove a track from the track-cache switching logic; and replace a track in the track-cache switching logic with another track according to a replacement policy.
  12. establishment according Claim 1 , furthermore comprising: a single-cycle next-fetch prediction logic configured to predict a next fetch address after a track in response to a hit in the track cache logic.
  13. establishment according Claim 1 , wherein the prediction switching logic includes: a common table switching logic configured to store a prediction table; a CTI prediction line switching logic configured to access the prediction table to predict the directions of multiple control transfer instructions in a given fetch group; and track prediction line switching logic configured to access the prediction table to predict a direction of a track-terminating control transfer instruction in a track cache from the track cache switching logic.
  14. establishment according Claim 1 , wherein the device is a computing device which further includes: a central processing unit; a display; and network interface switching logic.
  15. A method comprising: Tracking the directions of executed control transfer instructions by a processor, including: a first category of control transfer instructions that reach a first bias threshold in a given direction; and a second category of control transfer instructions that do not reach the first bias threshold; Identifying, by the processor, traces of instructions that satisfy a set of criteria, including: a given trace that includes at least one internal control transfer instruction; a conditional control transfer instruction that is permitted as an internal trace instruction only if it satisfies the bias threshold; and a control transfer instruction in the second category that is permitted only at the end of a given trace; and storing one or more identified traces by the processor.
  16. Procedure according to Claim 15 , where the first category only includes control transmission instructions that have consistently followed the specified direction during execution over a certain period of time.
  17. Procedure according to Claim 15 , the operations further include: buffering tracks until they reach a utility threshold; and validating entries in the track cache switching logic for tracks that satisfy one or more criteria and the utility threshold.
  18. Procedure according to Claim 15 , where identifying includes assembling tracks for temporary storage in the track-cache switching logic from multiple fetch groups that were fetched from an instruction cache.
  19. A non-transitory, computer-readable medium containing instructions in a hardware description programming language which, when processed by a computing system, program the computing system to generate a computer simulation model, the model being a hardware circuit including: processor switching logic configured to execute control transfer instructions; predictive switching logic configured to track the directions of executed control transfer instructions, including: a first category of control transfer instructions that reach a first bias threshold in a given direction; and a second category of control transfer instructions that do not reach the first bias threshold; trace-cache switching logic configured to: identify tracks of instructions that satisfy a set of criteria, including: a given track that includes at least one internal control transfer instruction; a conditional control transfer instruction that is only permissible as an internal track instruction if it meets the bias threshold; and a control transfer instruction in the second category that is only permissible at the end of a given track; and the storage of one or more identified tracks.
  20. Non-transitory computer-readable medium according to Claim 19 , where the first category only includes control transmission instructions that have consistently followed the specified direction during execution over a certain period of time.

Description

STATE OF THE ART Technical field This revelation relates generally to the architecture of computer processors and in particular to track-cache switching logic. Description of the related technique Trace caches store traces of instructions, with each trace typically containing at least one internal branch and potentially ending with a branch. Trace caches can significantly improve performance by increasing the bandwidth available for instruction retrieval (compared to multiple retrievals from a single instruction cache) and also by reducing power consumption during retrieval. However, it can be costly if a branch unexpectedly exits a trace within the trace cache, and a trace cache and its control logic can consume a significant portion of processor area and power. These trade-offs have made the practical implementation of trace caches in conventional processor designs difficult. BRIEF DESCRIPTION OF THE DRAWINGS 1 is a block diagram that represents an example of a processor pipeline, which, according to some embodiments, includes a track cache.2A is a diagram illustrating examples of branching stability categories according to some embodiments.2B is a diagram illustrating examples of stable/unstable positions of the retrieval group that are permissible in a track according to some embodiments.3 is a diagram illustrating example fields of a trace cache entry according to some embodiments.4 is a block diagram that illustrates a more detailed example of a pipeline according to some embodiments.5 is a block diagram illustrating an example of a track predictor for the next retrieval according to some embodiments.6 is a block diagram illustrating an example of a pre-hashed branch history token according to some embodiments.7 is a block diagram illustrating an example of a trace prediction line for a branch predictor with shared resources according to some embodiments.8 is a block diagram illustrating an example of a track prediction line for a labeled geometric length predictor (DAYS) according to some embodiments.9 is a diagram illustrating examples of relaxed conditions to allow a less stable control transmission instruction before the end of a track according to some embodiments.10A is a diagram illustrating an example of a filter-based, “less stable before end” (LBE) trace cache technique according to some embodiments.10B is a diagram illustrating another example of track cache input fields according to some embodiments.10C is a diagram illustrating example fields of an entry of a predictor for the next retrieval according to some embodiments.11 is a block diagram showing an example of a pipeline that includes filter switching logic for training to prevent certain control transfer instructions from being included before the end of a lane, according to some embodiments.12 is a diagram illustrating an example of a control switching logic which, according to some embodiments, allows the sharing of a branch predictor table entry for both instruction cache and trace cache hits.13 is a flowchart illustrating an exemplary procedure for operating a track cache according to some embodiments.14 is a flowchart illustrating an exemplary procedure which, according to some embodiments, uses a less stable branch before the end (LBE) of a track.15 is a flowchart illustrating an exemplary procedure for operating a track predictor for the next call according to some embodiments.16 is a flowchart illustrating an exemplary procedure for operating a processor with a track cache jump prediction line according to some embodiments.17 is a block diagram illustrating an example of a computing device according to some embodiments.18 is a diagram illustrating exemplary applications of disclosed systems and devices, according to some embodiments.19 is a block diagram illustrating an exemplary computer-readable medium that stores circuit design information, according to some embodiments. DETAILED DESCRIPTION Overview and brief presentation of the Revelation In various embodiments, which are explained in detail below, the track-cache control logic is configured to assemble tracks of instructions that meet certain criteria. For example, the control logic can restrict internal conditional branches to "stable" branches that satisfy a bias threshold in one direction. “Internal” branches refer to branches that do not represent the last instruction in a track. The following section provides a brief introduction to various revelations related to a stable track cache. 1-4 They show examples of trace cache implementations. 5-6 refer to a next-fetch predictor configured to operate in conjunction with track cache hits. 7-8 illustrate the logic for an additional trace cache line for a branch predictor. 9-12 These refer to allowing less stable internal branches within a track in certain scenarios. The remaining figures show exemplary methods, devices, and systems. In the disclosed embodiments, the track cache switching logic is configured to cache only