EP-4258110-B1 - METHODS FOR COMBINING INSTRUCTIONS AND APPARATUSES HAVING MULTIPLE DATA PIPES

EP4258110B1EP 4258110 B1EP4258110 B1EP 4258110B1EP-4258110-B1

Inventors

ZHANG, Huaisheng
HONG, ZHOU
QI, Heng

Dates

Publication Date: 20260513
Application Date: 20151013

Claims (9)

A method for combining instructions, performed by a compiler, the method comprising: obtaining a plurality of first instructions, wherein each first instruction performs one of a calculation operation, a comparison operation, a logic operation, a selection operation, a branching operation, a LD/ST, Load/Store, operation, a SMP, sampling, operation and a complicated mathematics operation; combining the first instructions as one combined instruction according to data dependencies between the first instructions; and sending the combined instruction to a SP, Stream Processor, wherein the first instructions are combined according to the following rules: ALG+CMP+SEL; ALG+CMP+SEL+SFU/LS/SMP; ALG+CMP+Branch; ALG+LGC+SEL; ALG+LGC+SEL+SFU/LS/SMP; or ALG+LGC+Branch, ALG indicates a calculation instruction, CMP indicates a comparison instruction, LGC indicates a logic instruction, SEL indicates a selection instruction, Branch indicates a branching instruction, SFU indicates a mathematics computation instruction, LS indicates a Load/Store instruction and SMP indicates a sampling instruction; wherein the SP comprises: a DF, Data Fetch, unit (130); a bypass-pipe (180), coupled to a CR, Common Register, a CB, Constant Buffer, and the DF unit (130); and a main-pipe, coupled to the DF unit (130) and the bypass-pipe (180), comprising an ALG, Algorithm, unit (140), a comparison/logic unit (150) and a post-PROC, Process, unit (160), wherein the ALG, comparison/logic and post-PROC units (140, 150, 160) are coupled in series and each of the ALG, comparison/logic and post-PROC units (140, 150, 160) is coupled to the bypass-pipe (180); a determination unit of the post-PROC unit (160) writes the data back to the CR or outputs an operation result to the post-processing unit according to a result generated by the comparison/logic unit (150).
The method of the previous claim, wherein a first computation unit of the ALG unit (140) is coupled to the bypass-pipe (180) and the DF unit (130) for obtaining operands from the bypass-pipe and/or the DF unit (130); a second computation unit of the comparison/logic unit (150) is coupled to the bypass-pipe (180) and a first output of the first computation unit for obtaining operands from the bypass-pipe (180) and/or the first output; and a third computation unit of the post-PROC unit (160) is coupled to the bypass-pipe (180), the first output of the first computation unit and a second output of the second computation unit for obtaining operands from the bypass-pipe (180), the first output and/or the second output.
An apparatus having a plurality of data pipes, comprising: a SP, Stream Processor, wherein the SP comprises: a DF, Data Fetch, unit (130); a bypass-pipe (180), coupled to a CR, Common Register, a CB, Constant Buffer, and the DF unit (130); a main-pipe, coupled to the DF unit (130) and the bypass-pipe (180), comprising an ALG, Algorithm, unit (140), a comparison/logic unit (150) and a post-PROC, Process, unit (160), wherein the ALG, comparison/logic and post-PROC units (140, 150, 160) are coupled in series and each of the ALG, comparison/logic and post-PROC units (140, 150, 160) is coupled to the bypass-pipe (180); and a compiler configured to obtain a plurality of first instructions, wherein each first instruction performs one of a calculation operation, a comparison operation, a logic operation, a selection operation, a branching operation, a LD/ST, Load/Store, operation, a SMP, sampling, operation and a complicated mathematics operation; combine the first instructions as one combined instruction according to data dependencies between the first instructions; and send the combined instruction to the SP; wherein the first instructions are combined according to the following rules: ALG+CMP+SEL; ALG+CMP+SEL+SFU/LS/SMP; ALG+CMP+Branch; ALG+LGC+SEL; ALG+LGC+SEL+SFU/LS/SMP; or ALG+LGC+Branch, ALG indicates a calculation instruction, CMP indicates a comparison instruction, LGC indicates a logic instruction, SEL indicates a selection instruction, Branch indicates a branching instruction, SFU indicates a mathematics computation instruction, LS indicates a Load/Store instruction and SMP indicates a sampling instruction; wherein a determination unit of the post-PROC unit (160) writes the data back to the CR or outputs an operation result to the post-processing unit according to a result generated by the comparison/logic unit (150).
The apparatus of the previous claim, wherein a first computation unit of the ALG unit (140) is coupled to the bypass-pipe (180) and the DF unit (130) for obtaining operands from the bypass-pipe (180) and/or the DF unit (130); a second computation unit of the comparison/logic unit (150) is coupled to the bypass-pipe (180) and a first output of the first computation unit for obtaining operands from the bypass-pipe (180) and/or the first output; and a third computation unit of the post-PROC unit (160) is coupled to the bypass-pipe (180), the first output of the first computation unit and a second output of the second computation unit for obtaining operands from the bypass-pipe (180), the first output and/or the second output.
The apparatus of the previous claim, wherein the third computation unit is coupled to a LD/ST, Load/Store, unit (171), a SMP, Sampling, unit (173) and a SFU, Special Function, unit (175) for outputting an operation result.
The apparatus of the previous claim, wherein the LD/ST unit (171) performs a loading or storing instruction, the SMP unit (173) performs a texture sampling instruction and the SFU unit (175) performs a mathematics computation instruction.
The apparatus of previous claim, wherein the main-pipe performs a main-pipe instruction and the bypass-pipe (180) performs a bypass-pipe instruction.
The apparatus of the previous claim, wherein an operation of the main-pipe instruction is performed in parallel of an operation of the bypass-pipe instruction.
The apparatus of previous claim, wherein the main-pipe instruction is one of a calculation instruction, a comparison instruction, a logic instruction, a selection instruction and a branching instruction and the bypass-instruction is a moving instruction.

Description

BACKGROUND Technical Field The present invention relates to graphics processing, and in particular, it relates to methods for combining instructions and apparatuses having multiple data pipes. Description of the Related Art The GPU (Graphics Processing Unit) architecture typically has hundreds of basic shader processing units, referred to as SPs (Stream Processors). Each SP may deal with one SIMD (Single Instruction Multiple Data) thread of one instruction per cycle, and then switch into another SIMD thread at the next cycle. The performance of GPU is affected by two important factors: the total amount of SPs and the capacities of one SP. Thus, methods for combining instructions and apparatuses having multiple data pipes are introduced to improve the capacities of one SP. US 6237086 B1 describes an execution unit for a stack based computing system that can combine instructions into instruction groups for concurrent execution. The execution unit includes an instruction folding unit configured to combine the instructions into instruction groups and an instruction pipeline configured to execute the instructions and the instruction groups. US 2007/277021 A1 describes an instruction decoder which allows the folding away of JAVA virtual machine instructions pushing an operand onto the top of a stack merely as a precursor to a second JAVA virtual machine instruction which operates on the top of stack operand. BRIEF SUMMARY The invention is defined by the features of the independent claims. Preferred embodiments are defined by the features of the dependent claims. An embodiment of a method for combining instructions, performed by a compiler, contains at least the following steps. First instructions are obtained, where each performs one of a calculation operation, a comparison operation, a logic operation, a selection operation, a branching operation, a LD/ST (Load/Store) operation, a SMP (sampling) operation and a complicated mathematics operation. The first instructions are combined as one combined instruction according to data dependencies between the first instructions. The combined instruction is sent to a SP (Stream Processor). An embodiment of an apparatus having multiple data pipes is introduced, contains at least a DF (Data Fetch) unit, a bypass-pipe and a main-pipe. The bypass-pipe is coupled to a CR (Common Register), a CB (Constant Buffer) and the DF unit. The main-pipe, coupled to the DF unit and the bypass-pipe, comprises an ALG (Algorithm) unit, a comparison/logic unit and a post-PROC (Process) unit. The ALG, comparison/logic and post-PROC units are coupled in series and each of the ALG, comparison/logic and post-PROC units is coupled to the bypass-pipe. A detailed description is given in the following embodiments with reference to the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS The present invention can be fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein: FIG. 1 is the system architecture of a 3D (three-dimensional) graphics processing apparatus according to an embodiment of the invention;FIG. 2 is a flowchart illustrating a method for combining instructions according to an embodiment of the invention;FIGS. 3A and 3B illustrate the system architecture of a 3D graphics processing apparatus according to an embodiment of the invention;FIG. 4 is a flowchart illustrating a method for combining instructions according to an embodiment of the invention; andFIG. 5 illustrates the system architecture of a 3D graphics processing apparatus according to an embodiment of the invention. DETAILED DESCRIPTION The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims. The present invention will be described with respect to particular embodiments and with reference to certain drawings, but the invention is not limited thereto and is only limited by the claims. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Use of ordinal terms such as "first", "second", "third", etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having the same name (but for use of the ordinal term) to distinguish the claim element