DE-112024002981-T5 - Trace cache with filter for including internal control transmissions
Abstract
The disclosed techniques relate to track caches. The switching logic for the track cache can identify tracks that meet one or more criteria. Generally, the internal branches of a track should meet a threshold for bias in a particular direction. To achieve this, the processor can initially assume that branches meet the threshold, track their usefulness within the context of the track over time, and prevent the inclusion of branches that fall below a usefulness threshold (indicating that these branches are not sufficiently biased). Branches that do not meet the threshold can, for example, be added to a Bloom filter. Usefulness can be tracked during track training, during validity in a track cache, or both.
Inventors
- Ilhyun Kim
- Niket K. Choudhary
- Muawya M. Al-Otoom
- Pruthivi Vuyyuru
- Ronald P. Hall
- Andrew H. Lin
- Douglas C. Holman
- Samir Dutt
Assignees
- APPLE INC.
Dates
- Publication Date
- 20260513
- Application Date
- 20240708
- Priority Date
- 20230714
Claims (20)
- The setup comprises: a processor switching logic configured to execute control transfer instructions; predictive switching logic configured to predict directions of control transfer instructions; a track cache switching logic configured to identify and store tracks of instructions that meet one or more criteria, including that a given track contains at least one internal control transfer instruction; and a filter control switching logic configured to: for a given internal control transfer instruction in a track stored in the track cache switching logic, adjust a counter in a first direction in response to the execution of the given control transfer instruction toward the next section of the stored track, and adjust the counter in a second direction in response to the execution of the given control transfer instruction away from the next section of the stored track. and Preventing the inclusion of the given control transfer instruction as an internal instruction of traces in the trace cache in response to the counter reaching a threshold in the second direction.
- establishment according Claim 1 , furthermore comprising a control switching logic configured to: in response to a specific direction for a non-terminating control transfer instruction leaving a track, discard one or more subsequent instructions of a fetch group that included the non-terminating control transfer instruction and redirect the front-end switching logic of the processor switching logic to fetch other instructions.
- establishment according Claim 1 , wherein one or more criteria further include: that only first-category control transfer instructions are permitted at one or more positions within a given track, wherein first-category control transfer instructions reach a first preload threshold in a given direction and second-category control transfer instructions do not reach the first preload threshold; and that second-category control transfer instructions are permitted only at one or more other positions within the given track.
- establishment according Claim 3 , where a specific entry in the track cache switching logic includes a field indicating whether the track has a non-terminating control transfer instruction of the second category.
- establishment according Claim 4 , where a specific entry in the track cache switching logic includes a field that specifies a position within the track of the non-terminating control transfer instruction of the second category.
- establishment according Claim 1 , where: the filter switching logic is configured to adjust the counter more strongly in the second direction than in the first direction.
- establishment according Claim 1 , further comprising: an instruction cache; wherein: the prediction switching logic is configured to share prediction table entries and the prediction line switching logic for instruction cache and track cache hits; and the device includes a selection switching logic configured to select a prediction line output from the prediction switching logic based on the position of a corresponding control transmission instruction within a track.
- establishment according Claim 1 , where: to prevent the inclusion of the given control transfer instruction, the filter switching logic is configured to use a control transfer instruction that includes the given control transfer instruction The connected program counter accesses a Bloom filter; and the filter switching logic is configured to regularly clear the Bloom filter.
- establishment according Claim 1 , wherein the device is a computing device which further includes: a central processing unit; a display; and network interface switching logic.
- A method comprising: Executing control transmission instructions by a computing device; Predicting the directions of the control transmission instructions executed by the computing device; Identifying and storing in a track cache by the computing device tracks of instructions that satisfy one or more criteria, including that a given track contains at least one internally received control instruction; for a given internal control instruction in a track stored in the track cache, wherein the computing device sets a counter in a first direction in response to the given control instruction being executed in the direction of the next segment of the stored track and sets the counter in a second direction in response to a given control instruction being executed away from the next segment of the stored track; and Preventing the computing device from receiving the given control instruction as an internal instruction of tracks in the track cache when the counter reaches a threshold in the second direction.
- Procedure according to Claim 10 , furthermore, encompassing: in response to a particular direction for a non-terminating control transfer instruction leaving a track, discarding one or more subsequent instructions of a call group that included the non-terminating control transfer instruction, and redirecting the front-end switching logic of the computing device to call other instructions.
- establishment according Claim 10 , wherein the one or more image processing algorithms include: that only first-category control transfer instructions are permitted at one or more positions within a given track, wherein first-category control transfer instructions reach a first bias threshold in a given direction and second-category control transfer instructions do not reach the first bias threshold; and that second-category control transfer instructions are permitted only at one or more other positions within the given track.
- Procedure according to Claim 12 , where a particular entry in the track cache includes a field indicating whether the track contains a non-terminating control transfer instruction of the second category.
- Procedure according to Claim 13 , where a particular entry in the track cache includes a field that specifies a position within the track of the non-terminating control transfer instruction of the second category.
- Procedure according to Claim 10 , where the counter is set by a larger amount in the first direction than the counter is set in the second direction.
- Procedure according to Claim 10 , furthermore comprehensive: Splitting a prediction table field in a branching predictor for instruction cache and trace cache hits.
- Procedure according to Claim 10 , where preventing access to a Bloom filter using a program counter associated with the given control instruction.
- A non-transitory, computer-readable medium containing instructions for a hardware description programming language, which, when processed by a computer system, Program the computing system to generate a computer simulation model, where the model represents a hardware circuit that includes: processor switching logic configured to execute control transfer instructions; predictive switching logic configured to predict directions of control transfer instructions; trace cache switching logic configured to identify and store traces of instructions that meet one or more criteria, including that a given trace contains at least one internal control transfer instruction; and filter control switching logic configured to, for a given internal control transfer instruction in a trace stored in the trace cache switching logic, adjust a counter in a first direction in response to the execution of the given control transfer instruction toward the next section of the stored trace, and adjust the counter in a second direction in response to the execution of the given control transfer instruction away from the next section of the stored trace. and preventing the inclusion of the given control transmission instruction as an internal instruction of traces in the trace cache in response to the counter reaching a threshold in the second direction.
- Non-transitory computer-readable medium according to Claim 18 , wherein one or more criteria further include: that only first-category control transfer instructions are permitted at one or more positions within a given track, wherein first-category control transfer instructions reach a first preload threshold in a given direction and second-category control transfer instructions do not reach the first preload threshold; and that second-category control transfer instructions are permitted only at one or more other positions within the given track.
- Non-volatile, computer-readable medium according to Claim 18 , wherein: to prevent the inclusion of the given control transfer instruction, the filter switching logic is configured to access a Bloom filter using a program counter associated with the given control transfer instruction; and the filter switching logic is configured to periodically clear the Bloom filter.
Description
STATE OF THE ART Technical field This revelation relates generally to the architecture of computer processors and in particular to track-cache switching logic. Description of the related technique Trace caches store traces of instructions, with each trace typically containing at least one internal branch and potentially ending with a branch. Trace caches can significantly improve performance by increasing the bandwidth available for instruction retrieval (compared to multiple retrievals from a single instruction cache) and also by reducing power consumption during retrieval. However, it can be costly if a branch unexpectedly exits a trace within the trace cache, and a trace cache and its control logic can consume a significant portion of processor area and power. These trade-offs have made the practical implementation of trace caches in conventional processor designs difficult. BRIEF DESCRIPTION OF THE DRAWINGS 1 is a block diagram that represents an example of a processor pipeline, which, according to some embodiments, includes a track cache.2A is a diagram illustrating examples of branching stability categories according to some embodiments.2B is a diagram illustrating examples of stable/unstable positions of the retrieval group that are permissible in a track according to some embodiments.3 is a diagram illustrating example fields of a trace cache entry according to some embodiments.4 is a block diagram that illustrates a more detailed example of a pipeline according to some embodiments.5 is a block diagram illustrating an example of a track predictor for the next retrieval according to some embodiments.6 is a block diagram illustrating an example of a pre-hashed branch history token according to some embodiments.7 is a block diagram illustrating an example of a trace prediction line for a branch predictor with shared resources according to some embodiments.8 is a block diagram illustrating an example of a track prediction line for a labeled geometric length predictor (DAYS) according to some embodiments.9 is a diagram illustrating examples of relaxed conditions to allow a less stable control transmission instruction before the end of a track according to some embodiments.10A is a diagram illustrating an example of a filter-based, “less stable before end” (LBE) trace cache technique according to some embodiments.10B is a diagram illustrating another example of track cache input fields according to some embodiments.10C is a diagram illustrating example fields of an entry of a predictor for the next retrieval according to some embodiments.11 is a block diagram showing an example of a pipeline that includes filter switching logic for training to prevent certain control transfer instructions from being included before the end of a lane, according to some embodiments.12 is a diagram illustrating an example of a control switching logic which, according to some embodiments, allows the sharing of a branch predictor table entry for both instruction cache and trace cache hits.13 is a flowchart illustrating an exemplary procedure for operating a track cache according to some embodiments.14 is a flowchart illustrating an exemplary procedure which, according to some embodiments, uses a less stable branch before the end (LBE) of a track.15 is a flowchart illustrating an exemplary procedure for operating a track predictor for the next call according to some embodiments.16 is a flowchart illustrating an exemplary procedure for operating a processor with a track cache jump prediction line according to some embodiments.17 is a block diagram illustrating an example of a computing device according to some embodiments.18 is a diagram illustrating exemplary applications of disclosed systems and devices, according to some embodiments.19 is a block diagram illustrating an exemplary computer-readable medium that stores circuit design information, according to some embodiments. DETAILED DESCRIPTION Overview and brief presentation of the Revelation In various embodiments, which are explained in detail below, the track-cache control logic is configured to assemble tracks of instructions that meet certain criteria. For example, the control logic can restrict internal conditional branches to "stable" branches that satisfy a bias threshold in one direction. “Internal” branches refer to branches that do not represent the last instruction in a track. The following section provides a brief introduction to various revelations related to a stable track cache. 1-4 They show examples of trace cache implementations. 5-6 refer to a next-fetch predictor configured to operate in conjunction with track cache hits. 7-8 illustrate the logic for an additional trace cache line for a branch predictor. 9-12 These refer to allowing less stable internal branches within a track in certain scenarios. The remaining figures show exemplary methods, devices, and systems. In the disclosed embodiments, the track cache switching logic is configured to cache only