CN-121996294-A - Program flow control device, chip product, processor, apparatus and method
Abstract
A program flow control device, a chip product, a processor, equipment and a method relate to the technical field of computers. The device comprises a branch processing unit in the device, a push logic unit in the device, a stack processing unit and a processing unit in the device, wherein the branch processing unit is used for determining respective effective signals of a plurality of instruction branches in a first program, the first program is executed by a plurality of threads, the effective signals of the instruction branches are used for indicating at least one thread used for executing the instruction branches in the plurality of threads, the push logic unit in the device is used for executing one push operation for a stack under the condition that the first instruction branches are preferentially executed, a first entry is generated in the stack, the first entry comprises address information and first type information used for determining other instruction branches except the first instruction branches in the plurality of instruction branches, and second type information used for determining address information and state information of a convergence point instruction, and the state information of the convergence point instruction is used for determining whether conditions for executing the convergence point instruction are met. The application can reduce the power consumption required by the branch structure in the processing program.
Inventors
- REN ZIMU
Assignees
- 腾讯科技(深圳)有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241105
Claims (16)
- 1. A program flow control device is characterized in that the device comprises a branch processing unit, a push logic unit and a stack; The branch processing unit is configured to determine respective valid signals of a plurality of instruction branches in a first program, where the first program is executed by a plurality of threads, and the valid signals of the instruction branches are used to instruct at least one thread, among the plurality of threads, for executing the instruction branches, where the instruction branches include at least one computer instruction; The push logic unit is configured to execute a push operation for the stack when a first instruction branch of the plurality of instruction branches is preferentially executed, generate a first entry in the stack, where the first entry includes first type information and second type information, the first type information is used to determine address information and valid signals of other instruction branches of the plurality of instruction branches than the first instruction branch, the second type information is used to determine address information and status information of a convergence point instruction, the convergence point instruction is a computer instruction executed after execution of the plurality of instruction branches is completed, and the status information of the convergence point instruction is used to determine whether a condition for executing the convergence point instruction is satisfied.
- 2. The apparatus of claim 1, wherein the first type of information comprises address information of a start instruction in the other instruction branch, valid signals of the other instruction branch, and the second type of information comprises address information of the convergence point instruction, status information of the convergence point instruction, each entry in the stack being allocated at least four consecutive field segments; The push logic unit is configured to push address information of the converging point instruction from a top of the stack to a first field of the first entry, push status information of the converging point instruction from the top of the stack to a second field of the first entry, push address information of a start instruction in the other instruction branches from the top of the stack to a third field of the first entry, and push an effective signal of the other instruction branches from the top of the stack to a fourth field of the first entry.
- 3. The device according to claim 1 or 2, wherein, The branch processing unit is further configured to send an effective signal of the other instruction branch to the push logic unit; The push logic unit is further configured to determine status information of the convergence point instruction based on the number of valid signals of the other instruction branches.
- 4. A device according to any one of claims 1 to 3, further comprising a read logic unit, a first arithmetic unit and a second arithmetic unit; The reading logic unit is used for reading the address information of the convergent point instruction and the state information of the convergent point instruction from the first entry, sending the address information of the convergent point instruction to the first operation unit, and sending the state information of the convergent point instruction to the second operation unit; The first operation unit is used for acquiring first address information, wherein the first address information refers to address information of a computer instruction which is currently executed, generating an intermediate signal based on the first address information and the address information of the convergence point instruction, and the intermediate signal is used for indicating whether the first instruction branch is executed or not; The second operation unit is configured to generate a jump enable signal based on the intermediate signal and state information of the convergence point instruction, where the jump enable signal is used to indicate whether to jump to execute the computer instruction in the first program.
- 5. The apparatus of claim 4, further comprising a first storage unit for storing a valid signal of a currently executing instruction branch; The read logic unit is further configured to: Reading address information of a start instruction in a second instruction branch and a valid signal of the second instruction branch from the first entry in a case where the jump enable signal is a first operation value, the second instruction branch being another instruction branch other than the first instruction branch of the plurality of instruction branches, the jump enable signal being for instructing to jump to execute a computer instruction in the first program in a case where the jump enable signal is the first operation value; Updating the valid signal of the second instruction branch into the first storage unit; And outputting the address information of the starting instruction in the second instruction branch as a jump address signal, wherein the jump address signal is used for indicating to jump to execute the second instruction branch.
- 6. The apparatus according to claim 4 or 5, wherein the first arithmetic unit is configured to: Outputting the intermediate signal as a first value under the condition that the first address information is the same as the address information of the convergence point instruction; Or alternatively Outputting the intermediate signal as a second value under the condition that the first address information and the address information of the convergence point instruction are different; Wherein the intermediate signal is used to indicate that the first instruction branch is executed to completion if the intermediate signal is the first value, and is used to indicate that the first instruction branch is not executed to completion if the intermediate signal is the second value.
- 7. The apparatus of any of claims 1 to 6, wherein the push logic unit is further configured to: after each instruction branch is executed, updating state information of the convergence point instruction in the first entry; And when the state information of the convergence point instruction is a third value, the state information of the convergence point instruction is used for indicating that no unexecuted instruction branch exists in the plurality of instruction branches.
- 8. The apparatus of claim 7, further comprising a fetch logic unit, a first storage unit, and a second arithmetic unit, the first storage unit to store valid signals for a currently executing instruction branch; The reading logic unit is configured to read, when execution of a third instruction branch of the plurality of instruction branches is completed and the status information of the convergence point instruction is the third value, an effective signal of the convergence point instruction from the first entry, where the effective signal of the convergence point instruction is used to activate the plurality of threads, and the first entry is deleted from a stack top of the stack after the reading logic unit reads the effective signal of the convergence point instruction; the reading logic unit is further configured to update an effective signal of the convergence point instruction to the first storage unit; The second operation unit is configured to output a jump enable signal as a second operation value when the third instruction branch is executed and the state information of the convergence point instruction is the third numerical value, and instruct not to jump to execute the computer instruction in the first program to continue to execute the convergence point instruction when the jump enable signal is the second operation value.
- 9. The apparatus of claim 8, further comprising a third arithmetic unit; The third operation unit is configured to perform a logic operation on the valid signals of the multiple instruction branches to obtain valid signals of the convergence point instruction, and update the valid signals of the convergence point instruction into the first entry; Or alternatively The push logic is further configured to push a valid signal of the convergence point instruction into the first entry.
- 10. The apparatus according to any one of claims 1 to 9, further comprising a stack control unit and a second storage unit; The push logic unit is further configured to control the stack control unit to update a value in the second storage unit, where the value in the second storage unit is used to indicate an entry located at a top of a stack of the stack, in a case where the first entry is generated.
- 11. The apparatus according to any one of claims 1 to 10, wherein the branch processing unit is further configured to: Executing the vector branch instruction under the condition of acquiring vector branch instructions corresponding to the instruction branches, and determining respective jump directions of the threads, wherein the jump directions are used for indicating instruction branches executed by the threads, and the vector branch instruction is used for indicating a judging rule of the jump directions of the threads; determining respective effective signals of the plurality of instruction branches according to respective jump directions of the plurality of threads when the respective jump directions of the plurality of threads are different; And updating the effective signal of the first instruction branch to a first storage unit, wherein the first storage unit is used for storing the effective signal of the currently executed instruction branch.
- 12. The device according to any one of claims 1 to 11, wherein, The branch processing unit is further configured to execute a sub-vector branch instruction in the first instruction branch if the sub-vector branch instruction in the first instruction branch is acquired, determine valid signals of a plurality of sub-instruction branches corresponding to the sub-vector branch instruction in the first instruction branch, where the valid signals of the sub-instruction branches are used to indicate at least one thread used to execute the sub-instruction branch in the plurality of threads; The push logic unit is further configured to perform a push operation for the stack if a first sub-instruction branch of the plurality of sub-instruction branches is preferentially executed, generate a second entry in the stack, where the second entry includes third type information and fourth type information, where the third type information is used to determine address information and valid signals of other sub-instruction branches of the plurality of sub-instruction branches except the first sub-instruction branch, and the fourth type information is used to determine address information and status information of a sub-convergence point instruction, where the sub-convergence point instruction is a computer instruction executed after execution of the plurality of sub-instruction branches, and the status information of the sub-convergence point instruction is used to determine whether a condition for executing the sub-convergence point instruction is satisfied.
- 13. A chip product, characterized in that it comprises a program flow control device according to any of claims 1 to 12.
- 14. A processor comprising a program flow control device as claimed in any one of claims 1 to 12.
- 15. A computer device comprising a processor comprising the program flow control apparatus of any of claims 1 to 12.
- 16. A program flow control method, the method comprising: a branch processing unit determines respective valid signals of a plurality of instruction branches in a first program, the first program being executed by a plurality of threads, the valid signals of the instruction branches being indicative of at least one of the plurality of threads for executing the instruction branches, the instruction branches comprising at least one computer instruction; The push logic unit performs a push operation for a stack if a first instruction branch of the plurality of instruction branches is preferentially executed, and generates a first entry in the stack, where the first entry includes first type information and second type information, the first type information is used for determining address information and valid signals of other instruction branches of the plurality of instruction branches except the first instruction branch, the second type information is used for determining address information and state information of a convergence point instruction, the convergence point instruction is a computer instruction executed after execution of the plurality of instruction branches is completed, and the state information of the convergence point instruction is used for determining whether a condition for executing the convergence point instruction is satisfied.
Description
Program flow control device, chip product, processor, apparatus and method Technical Field The embodiment of the application relates to the technical field of computers, in particular to a program flow control device, a chip product, a processor, equipment and a method. Background In the field of computer technology, a program (program) is an ordered set of computer instructions, a process (process) is an execution process of the program, and the process may be further refined into threads (threads), where each thread is an execution path of the computer instructions in the program. The processor may turn on multiple threads to execute the same program to process multiple tasks in parallel. In the event of encountering a branch structure (branching structure) in the program, different threads may employ different instruction branches to perform tasks due to the different tasks. The branch structure may be used to execute different instruction branches according to different judging conditions, such as a double-branch structure formed by if-else sentences, code blocks corresponding to if branches may be compiled into at least one computer instruction to form one instruction branch in the program, and code blocks corresponding to else branches may be compiled into at least one computer instruction to form another instruction branch in the program. In this case, related information of the convergence point instruction corresponding to the branch structure is pushed onto the stack from the stack top of the stack to form a first entry, and then related information of the post-processed instruction branch is pushed onto the stack from the stack top of the stack to form a second entry, where the convergence point instruction may be a computer instruction executed by the processor when the multiple threads are converged again. After the previously processed instruction branch is executed by the processor, the related technology acquires related information of the post-processed instruction branch in the second entry from the stack top of the stack to instruct the processor to execute the post-processed instruction branch, then pops the second entry from the stack top of the stack so that the first entry can be acquired from the stack top of the stack, and after the post-processed instruction branch is executed by the processor, the related technology can acquire related information of the convergence point instruction in the first entry from the stack top of the stack again to instruct the processor to execute the convergence point instruction, wherein the related information comprises information for indicating states of a plurality of threads, so that all threads are executed correctly. However, each time a branch structure is encountered, the related art needs to perform two stacking operations and two pop operations for the stack, and the power consumption generated by the related art is large. Disclosure of Invention The embodiment of the application provides a program flow control device, a chip product, a processor, equipment and a method. The technical scheme is as follows: according to an aspect of an embodiment of the present application, there is provided a program flow control apparatus including a branch processing unit, a push logic unit, and a stack; The branch processing unit is configured to determine respective valid signals of a plurality of instruction branches in a first program, where the first program is executed by a plurality of threads, and the valid signals of the instruction branches are used to instruct at least one thread, among the plurality of threads, for executing the instruction branches, where the instruction branches include at least one computer instruction; The push logic unit is configured to execute a push operation for the stack when a first instruction branch of the plurality of instruction branches is preferentially executed, generate a first entry in the stack, where the first entry includes first type information and second type information, the first type information is used to determine address information and valid signals of other instruction branches of the plurality of instruction branches than the first instruction branch, the second type information is used to determine address information and status information of a convergence point instruction, the convergence point instruction is a computer instruction executed after execution of the plurality of instruction branches is completed, and the status information of the convergence point instruction is used to determine whether a condition for executing the convergence point instruction is satisfied. According to an aspect of an embodiment of the present application, there is provided a chip product including the program flow control device described above. According to an aspect of an embodiment of the present application, there is provided a processor including the program flow control device described above. Ac