CN-121996300-A - Method for processing branch difficult to predict, computer device, medium and product
Abstract
The application discloses a processing method, a computer device, a medium and a product of branches difficult to predict. The method for processing the branch difficult to predict comprises the steps of converting a code segment containing H2P into a plurality of basic blocks and writing the basic blocks into a preset cache, obtaining decoded micro instructions, determining a micro instruction group, obtaining a physical address when the cache does not contain the micro instructions, setting a hit item in the cache as a first value when the micro instruction group contains the address and a program end address deviates from a micro instruction group access address, triggering the program address to access the preset cache and a preset predictor when all the hit items of which the valid items are the first value and the program address hits the program address in the cache, obtaining a prediction direction, and setting the next sequential program address of the H2P as a prediction target address corresponding to the direction opposite to the prediction direction. The method can be suitable for deep pipeline and high-density H2P scene, can realize the processing of multiple H2P branches, and improves the overall performance.
Inventors
- ZHAO CHUNYAO
Assignees
- 海光信息技术股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260127
Claims (18)
- 1. A method of processing a branch that is difficult to predict, comprising: acquiring a code segment containing a branch difficult to predict in the original code of a program; Converting the code segment into a plurality of basic blocks, and writing the basic blocks into a first preset cache when a first condition is met; Processing the program address in the program original code to obtain a decoded micro instruction and determining a target micro instruction group; Translating a program address in the first preset cache to obtain a physical address when the first preset cache does not contain the decoded micro instruction; Triggering to execute a first strategy when the target micro instruction group contains the physical address and the program end address in the first preset cache is the same as the offset of the last byte address of the target micro instruction group; Triggering and executing a second strategy when the first preset cache meets a first condition, obtaining alternative path branch prediction information and stopping execution of the program address in a branch prediction unit; and determining the next sequential program address of the corresponding difficult-to-predict branch in the first preset cache according to the replacement path branch prediction information.
- 2. The method for processing a difficult-to-predict branch according to claim 1, wherein the acquiring code segments of the difficult-to-predict branch contained in the program original code comprises: identifying the original code of the program which contains the difficult-to-predict branches and is repeatedly executed and taking the original code as the target code to be processed; Analyzing the target code through a software program to obtain a code segment containing branches difficult to predict; wherein the difficult-to-predict branch is a branch whose accuracy of the branch direction prediction made by the branch prediction circuit is below a preset threshold.
- 3. The method for processing a branch difficult to predict according to claim 2, wherein converting the code segment into a plurality of basic blocks and writing the basic blocks into a first preset cache when a first condition is satisfied, generating a writing result comprises: Converting the code segment into a plurality of basic blocks, and storing all the basic blocks into a preset memory; in response to a successful store signal, generating an instruction to be written, When the program is run again, writing all the basic blocks in the preset memory into a first preset cache according to the instruction to be written, and generating a writing result; The basic block includes a program address, a program end address, a target address, a branch type, a hit, and a valid entry.
- 4. A method of processing a hard-to-predict branch according to claim 3 wherein said processing a program address in said program source code to obtain decoded microinstructions and determining a target microinstruction set comprises: Responding to the instruction of which the writing result is successful, running the program original code again, and identifying a program address in the program original code; calling a branch prediction unit to analyze the program address to obtain a branch prediction result and storing the branch prediction result into a prediction queue, wherein the branch prediction result comprises a prediction direction and a prediction target address; the program address sequentially passes through an instruction cache and a decoding unit, and decoded micro instructions are obtained and stored in a micro instruction queue; And storing a plurality of target micro instruction groups determined from all the decoded micro instructions into a micro instruction cache.
- 5. The method of claim 4, wherein if the first default cache contains the decoded microinstructions, storing the decoded microinstructions in a second default cache.
- 6. The method of claim 1, wherein triggering execution of the first policy comprises setting a corresponding hit entry of the first default cache to a first value.
- 7. The method according to claim 6, wherein triggering execution of a second policy to obtain alternate path branch prediction information and to stop execution of the program address in a branch prediction unit when the first predetermined cache satisfies a first condition comprises: When all the valid items in the first preset cache are the first numerical value, triggering the program address to access the first preset cache and a preset predictor to obtain the branch prediction information of the replacement path and stopping the execution of the program address in the branch prediction unit.
- 8. The method of claim 7, wherein determining a next sequential program address of a corresponding refractory branch in the first preset cache based on the alternate path branch prediction information comprises: storing the associated information of the basic blocks with the branch types difficult to predict in the first preset cache into a preset queue, and setting the corresponding next sequential program address of the difficult-to-predict branch as a replacement path PC; The alternate path PC is a predicted target address corresponding to a direction opposite to a predicted direction in the alternate path branch prediction information.
- 9. The method for processing a difficult-to-predict branch according to claim 1, wherein storing the association information of the basic block of the branch type of the first preset cache as the difficult-to-predict branch to the preset queue comprises: acquiring all basic blocks of which the branch types are difficult-to-predict branches in the first preset cache, and marking the basic blocks as target blocks; Acquiring target associated information of the target block, wherein the target associated information comprises a program counter address, an end address offset, a current path ID, a next entry ID of the current path, a branch history and a next entry ID of a replacement path; And storing the target associated information to a preset queue.
- 10. The method according to claim 1, wherein during the process of accessing the first preset cache by the program address, the method further comprises obtaining all basic blocks of which the branch type is not a difficult-to-predict branch in the first preset cache, and marking the basic blocks as target blocks; acquiring target information of the target block, wherein the target information comprises a program counter address, an end address offset, a current path ID, a next entry ID of the current path and a branch history; and storing the target information into a preset queue.
- 11. The method of claim 1, wherein the alternate path PC accesses an alternate path branch prediction unit simultaneously with a next path PC of a corresponding current path at a next clock cycle, the alternate path branch prediction unit including the first preset cache and the preset predictor.
- 12. The method according to claim 11, wherein when the prediction direction in the alternative path branch prediction information is different from the corresponding execution result, flushing all entries under the alternative path pipeline, the original path younger than the difficult predicted branch in the preset queue, and the sub path thereof, and selecting the latest alternative path PC corresponding to the difficult predicted branch to execute the access of the alternative path branch prediction unit.
- 13. The method according to claim 12, wherein when the prediction direction in the alternative path branch prediction information is the same as the corresponding execution result, all entries under the alternative path and its sub-path younger than the difficult-to-predict branch in the preset queue are flushed.
- 14. The method for processing a difficult-to-predict branch according to claim 1 further comprising continuing to execute acquisition of a new alternate path PC when a number of difficult-to-predict branches in said predetermined queue that have not been executed to completion is less than a predetermined threshold.
- 15. The method of claim 1, wherein when the program address in the program source code is a revisited program address and is already present in the micro instruction cache, the corresponding program address is entered into the micro instruction queue through the micro instruction cache.
- 16. A computer apparatus, the computer apparatus comprising: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of processing a hard-to-predict branch of any one of claims 1-15.
- 17. A computer-readable storage medium storing computer instructions for causing a computer to perform the method of processing a difficult-to-predict branch according to any one of claims 1-15.
- 18. A computer program product comprising computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 15.
Description
Method for processing branch difficult to predict, computer device, medium and product Technical Field The disclosure relates to the field of computer technology, and in particular relates to a processing method, a computer device, a medium and a product of branches difficult to predict. Background Branch predictors for modern high performance processor cores have very high precision for regular loops, repetitive control flows, branch MPKI (misprediction rate per thousand branch instructions) are generally below 5, however, when the program is characterized by highly irregular control flow (no obvious pattern), input sensitive (direction is only determined when running), non-repetitive (one-time initialization/exception path) or cross-dependence complex (result dependent extremely far-ahead instructions), the predictor continues to misjudge due to the inability of the history table to converge, limited training window, MPKI can reach more than 100, such branches are referred to as H2P (Hard-to-prediction) branches. In order to eliminate the misprediction cost of H2P, two types of pure hardware remedy schemes are mainly proposed in the prior art, one type is to simultaneously store two instruction streams of 'fetch/not fetch' after H2P is detected, if the follow-up proves that the misprediction is wrong, the method is immediately switched to the prepared reverse stream, in order to save storage, the record is stopped once any non-tendency conditional branch is encountered in the reverse stream, so that the standby path is too short, the instruction still needs to be re-fetched in a deep pipeline scene, the performance benefit is limited, the H2P is judged by depending on the statistics of the operation time, and the training cold start and misjudgment cost exists. The other is to add a complete front-end pipeline, which is specially used for pre-executing reverse flow for the current H2P, buffering micro instructions in a queue and switching seamlessly in false measurement, but the chip only provides a single additional pipeline, when the code section contains continuous or nested H2P, the second and subsequent H2P cannot enjoy pre-execution at the same time to form a resource bottleneck, and the H2P can be identified just by hardware training, so that the optimized window is lagged. Disclosure of Invention In view of this, the embodiments of the present disclosure provide a method, a computer device, a medium, and a product for processing branches difficult to predict, which can solve the problem that a sufficient amount of multi-directional standby instruction flows cannot be provided in time in a deep pipeline and a high-density H2P scene, resulting in high branch misprediction penalty. In a first aspect, an embodiment of the present disclosure provides a method for processing a branch difficult to predict, including: acquiring a code segment containing a branch difficult to predict in the original code of a program; Converting the code segment into a plurality of basic blocks, and writing the basic blocks into a first preset cache when a first condition is met; Processing the program address in the program original code to obtain a decoded micro instruction and determining a target micro instruction group; Translating a program address in the first preset cache to obtain a physical address when the first preset cache does not contain the decoded micro instruction; Triggering to execute a first strategy when the target micro instruction group contains the physical address and the program end address in the first preset cache is the same as the offset of the last byte address of the target micro instruction group; Triggering and executing a second strategy when the first preset cache meets a first condition, obtaining alternative path branch prediction information and stopping execution of the program address in a branch prediction unit; and determining the next sequential program address of the corresponding difficult-to-predict branch in the first preset cache according to the replacement path branch prediction information. In a second aspect, an embodiment of the present disclosure further provides a computer apparatus, which adopts the following technical scheme: The computer apparatus includes: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of processing a hard-to-predict branch as described above. In a third aspect, the disclosed embodiments also provide a computer-readable storage medium storing computer instructions for causing a computer to perform any of the above methods of processing a difficult-to-predict branch. In a fourth aspect, the presently disclosed embodiments also provide a computer program product comprising a computer program/instruction which, when executed by a processor, implements the steps of the