CA-3240618-C - VECTOR SHUFFLING METHOD, PROCESSOR AND ELECTRONIC DEVICE
Abstract
A vector shuffling method, a processor, and an electronic device. The method includes: receiving an instruction, the instruction includes: a register identifier including a source register identifier characterizing a source register and a destination register identifier characterizing a destination register, and a shuffling parameter; the source register stores a source element operated when a vector shuffling operation is performed; the destination register identifier stores a target element obtained after the operation is performed; and the shuffling parameter indicates a parameter according to which the operation is performed on the source element; executing the instruction to perform the operation on the source element obtained from the source register according to the shuffling parameter, and obtaining the target element after performing the operation; writing the target element into the destination register, which implements the vector shuffling operation for specific function with one instruction, improving execution efficiency of the specific function.
Inventors
- Wenxiang Wang
Assignees
- LOONGSON TECHNOLOGY CORPORATION LIMITED
Dates
- Publication Date
- 20260505
- Application Date
- 20221208
- Priority Date
- 20211210
Claims (20)
- 44 CLAIMS 1. A vector shuffling method, used for implementing a vector shuffling operation through one shuffling instruction and without accessing a memory, and comprising: receiving, by a processor, an instruction, the instruction comprising a register identifier and a shuffling parameter; wherein the register identifier 5 comprises a source register identifier and a destination register identifier; the source register identifier is used to characterize a source register, the source register is a register storing a source element that is operated when the vector shuffling operation is performed; the destination register identifier is used to characterize a destination register, and the 10 destination register is the register storing a target element that obtained after the vector shuffling operation is performed; and the shuffling parameter is used to indicate a parameter according to which the vector shuffling operation is performed on the source element; executing, by the processor, the instruction to determine position information of 15 the source element required for the vector shuffling operation in the source register and the number of source element according to the shuffling parameter; wherein, the number of selected source element is one or more, the shuffling parameter comprises an index value and an opcode; the index value is used to indicate the position information of each source element required for the vector shuffling operation in the 20 source register; and the opcode is used to characterize an operation performed on the source register and the destination register; when the number of index value is different from the number of source element, determining, by the processor, a grouping method for the source element according to the number of index value, and determining, by the processor, a selection rule according 25 to the grouping method and the opcode; obtaining, by the processor, the source element indicated by each index value respectively from the source register according to the selection rule; determining, by the processor, all selected source elements as the target element; and writing, by the processor, the target element into the destination register.
- 2. The method according to claim 1, wherein the method further comprises: when the number of index value is the same as the number of source element, determining, by the processor, a selection r 5 ule according to the opcode.
- 3. The method according to claim 2, wherein the opcode is a first opcode, and the number of index value is different from the number of source element; the obtaining, by the processor, the source element indicated by each index value respectively from the source register according to the selection rule, comprises: 10 forming, by the processor, a set of element group for each N1 adjacent elements in the source register; wherein a data type of the element is any one of byte, half word, or word; N1 is a positive integer greater than 0; determining, by the processor, the element in each element group as an initial source element; 15 obtaining, by the processor, the source element indicated by each index value respectively from the initial source element; and the number of source element selected from each element group is n1.
- 4. The method according to claim 3, wherein the adjacent elements are elements with sequentially adjacent positions in the source register, and element addresses of 20 adjacent multiple element groups are partially identical or completely different; wherein, the data types of the elements included in each element group are the same; and the data types of elements included in different element groups are the same or different.
- 5. The method according to claim 2, wherein the opcode is a second opcode, and 25 the number of index value is the same as the number of source element; the obtaining, by the processor, the source element indicated by each index value respectively from the source register according to the selection rule, comprises: respectively obtaining, by the processor, the source element indicated by each index value from MN2 elements in each N2 bits in the source register; wherein, a data 46 type of the element is doubleword; the number of source element selected from the MN2 elements in each N2 bits is n2, and N2, MN2 and n2 are all positive integers greater than 0.
- 6. The method according to any one of claims 3-5, wherein before the determining, by the processor, the grouping method for the source element 5 according to the number of index value, the method further comprises: creating, by the processor, an intermediate vector; the intermediate vector comprises at least one intermediate vector parameter, and when there is the element group, the number of intermediate vector parameter is equal to the number of element 10 group; when there is no element group, the number of intermediate vector parameter is equal to the number of source element; the obtaining, by the processor, the source element indicated by each index value respectively from the source register according to the selection rule, comprises: storing, by the processor, each of selected source elements respectively in a 15 corresponding intermediate vector parameter in the intermediate vector; wherein, there is a one-to-one correspondence between the intermediate vector parameters and the selected source elements; the writing, by the processor, the target element into the destination register, comprises: 20 writing, by the processor, content of each intermediate vector parameter to a corresponding position in the destination register according to the shuffling parameter.
- 7. The method according to claim 2, wherein the opcode is a third opcode; the index value comprises a first index value, a second index value, a third index value, and a fourth index value; the first index value, the second index value, the third index value, 25 and the fourth index value index different positions respectively; and the source register comprises a first source register and a second source register; the obtaining, by the processor, the source element indicated by each index value respectively from the source register according to the selection rule, comprises: respectively obtaining, by the processor, source elements indicated by the first 47 index value and the second index value from MN3 elements in each N3 bits in the first source register; and respectively obtaining, by the processor, source elements indicated by the third index value and the fourth index value from the MN3 elements in each N3 bits in the second source register; wherein, a data type of the element 5 is word; the number of source element selected from the MN3 elements in each N3 bits is n3, and N3, MN3 and n3 are all positive integers greater than 0; the writing, by the processor, the target element into the destination register comprises: 10 determining, by the processor, the source element indicated by the first index value as a first target element, and determining, by the processor, the source element indicated by the second index value as a second target element; and determining, by the processor, the source element indicated by the third index value as a third target element, and determining, by the processor, the source element 15 indicated by the fourth index value as a fourth target element; writing, by the processor, the first target element and the second target element to a first position in the destination register; and writing, by the processor, the third target element and the fourth target element to a second position in the destination register.
- 8. The method according to claim 2, wherein the opcode is a fourth opcode; 20 the obtaining, by the processor, the source element indicated by each index value respectively from the source register according to the selection rule, comprises: obtaining, by the processor, the source element indicated by each index value from MN4 elements in the source register; wherein, a data type of the element is doubleword; the number of selected source element is n4, and both MN4 and n4 are positive integers 25 greater than 0.
- 9. The method according to claim 2, wherein the opcode is a fifth opcode; the index value comprises a first index value and a third index value, and the first index value and the third index value index different positions respectively; the source register comprises a first source register and a second source register; 48 the obtaining, by the processor, the source element indicated by each index value respectively from the source register according to the selection rule, comprises: obtaining, by the processor, a first source element indicated by the first index value from MN5 elements in the first source register; and obtaining, by the processor, a second source element indicated by the third index value from 5 the MN5 elements in the second source register; wherein, a data type of the element is quad word; the number of selected source elements is n5, wherein n5 is a positive integer greater than 0; the writing, by the processor, the target element into the destination register comprises: 10 determining, by the processor, the first source element and the second source element as the target elements respectively, and writing them, by the processor, into corresponding positions of the destination register.
- 10. The method according to any one of claims 1-5 or 7-9, wherein the number of source register is one or multiple, and the number of destination register is one; 15 when the number of source register is one, the source register identifier is different from the destination register identifier; when the number of source register is multiple, each source register identifier in all the source registers is different from the destination register identifier; alternatively, when the number of source register is multiple, there exists one source register identifier 20 among all the source registers that is the same as the destination register identifier.
- 11. A processor, configured to implement a vector shuffling operation through one shuffling instruction and without accessing a memory, and comprising: multiple vector registers, wherein the multiple vector registers comprise a source register and a destination register, and the source register is configured to store a data 25 element; a decoding unit, configured to decode a vector shuffling instruction; wherein, the vector shuffling instruction comprises: a register identifier and a shuffling parameter, and the register identifier comprises a source register identifier and a destination register identifier; 49 an executing unit, configured to perform the vector shuffling operation on a source element obtained from the source register according to the shuffling parameter in response to the vector shuffling instruction, obtain a target element after the vector shuffling operation, and write the target element into the destination register; the executing unit is configured to determine position 5 information of the source element in the source register and the number of source element according to the shuffling parameter; wherein, the number of selected source element is one or more; select the source element from the source register according to the determined position information and the number of source element; and determine all the selected source 10 elements as the target element; wherein the shuffling parameter comprises an index value and an opcode; the index value is used to indicate the position information of each source element required for the vector shuffling operation in the source register; and the opcode is used to characterize an operation performed on the source register and the destination register; 15 the executing unit is configured to determine a selection rule for obtaining the source element according to the index value and the opcode; and obtain the source element indicated by each index value respectively from the source register according to the selection rule; the executing unit is configured to determine a grouping method for the source 20 element according to the number of index value when the number of index value is different from the number of source element, and determine the selection rule according to the grouping method and the opcode.
- 12. The processor according to claim 11, wherein, the executing unit is configured to determine the selection rule according to the 25 opcode when the number of index value is the same as the number of source element.
- 13. The processor according to claim 12, wherein the opcode is a first opcode, and the number of index value is different from the number of source element; the executing unit is configured to form a set of element group for each N1 adjacent elements in the source register; wherein a data type of the element is any one 50 of byte, half word, or word, and N1 is a positive integer greater than 0; determine the element in each element group as an initial source element; and obtain the source element indicated by each index value respectively from the initial source element; wherein the number of source element selected from each element group is n1.
- 14. The processor according to claim 13, wherein the 5 adjacent elements are elements with sequentially adjacent positions in the source register, and element addresses of adjacent multiple element groups are partially identical or completely different; wherein, the data types of the elements included in each element group are the 10 same; and the data types of elements included in different element groups are the same or different.
- 15. The processor according to claim 12, wherein the opcode is a second opcode, and the number of index value is the same as the number of source element; the executing unit is configured to respectively obtain the source element indicated 15 by each index value from MN2 elements in each N2 bits in the source register; wherein, a data type of the element is doubleword; the number of source element selected from the MN2 elements in each N2 bits is n2, and N2, MN2 and n2 are all positive integers greater than 0.
- 16. The processor according to any one of claims 13-15, wherein, 20 the executing unit is configured to create an intermediate vector; the intermediate vector comprises at least one intermediate vector parameter, and when there is the element group, the number of intermediate vector parameter is equal to the number of element group; when there is no element group, the number of intermediate vector parameter is equal to the number of source element; store each of selected source 25 elements respectively in a corresponding intermediate vector parameter in the intermediate vector; wherein, there is a one-to-one correspondence between the intermediate vector parameters and the selected source elements; and write content of each intermediate vector parameter to a corresponding position in the destination register according to the shuffling parameter. 51
- 17. The processor according to claim 12, wherein the opcode is a third opcode; the index value comprises a first index value, a second index value, a third index value, and a fourth index value, and the first index value, the second index value, the third index value, and the fourth index value index different positions respectively; the source register comprises a first source register 5 and a second source register; the executing unit is configured to respectively obtain source elements indicated by the first index value and the second index value from MN3 elements in each N3 bits in the source register; and respectively obtain source elements indicated by the third index value and the fourth index value from the MN3 elements in each N3 bits in the 10 second source register; wherein, a data type of the element is word; the number of source element selected from the MN3 elements in each N3 bits is n3, and N3, MN3, and n3 are all positive integers greater than 0; determine the source element indicated by the first index value as a first target element, determine the source element indicated by the second index value as a second target element; determine the source element 15 indicated by the third index value as a third target element, and determine the source element indicated by the fourth index value as a fourth target element; write the first target element and the second target element to a first position in the destination register; and write the third target element and the fourth target element to a second position in the destination register. 20
- 18. The processor according to claim 12, wherein the opcode is a fourth opcode; the executing unit is configured to obtain the source element indicated by each index value from Mn4 elements in the source register; wherein, a data type of the element is doubleword; the number of selected source element is n4, and both Mn4 and n4 are positive integers greater than 0. 25
- 19. The processor according to claim 12, wherein the opcode is a fifth opcode; the index value comprises a first index value and a third index value, wherein the first index value and the third index value index different positions respectively; and the source register comprises a first source register and a second source register; the executing unit is configured to obtain a first source element indicated by the 52 first index value from Mn5 elements in the first source register; and obtain the second source element indicated by the third index value from the Mn5 elements in the first source register; wherein, a data type of the element is quad word; the number of selected source elements is n5, wherein n5 is a positive integer greater than 0; determine the first source element and the second source element as the target 5 elements respectively, and write them into corresponding positions of the destination register.
- 20. An electronic device, comprising a memory and one or more programs, wherein the one or more programs are stored in the memory, and configured to enable one or more processors to execute one or more of the vector shuffling method according 10 to any one of claims 1-10.
Description
1 VECTOR SHUFFLING METHOD, PROCESSOR AND ELECTRONIC DEVICE TECHNICAL FIELD [0001] The present application relates to the field of computer technology and, in particular, to a vector shuffling method, a processor 5 and an electronic device. BACKGROUND [0002] With a development of multimedia applications, more and more computing tasks for a processor come from the field of digital image processing. An image-based application has become a nonnegligible workload in servers, desktop computers, and personal mobile devices 10 (i.e., embedded devices). With respect to an actual situation of digital image processing software, updating an instruction set architecture and adding an instruction support for commonly used operations of the application in the processor is a major direction for developing the processor, and it is also a simple and effective method for the processor to improve performance with respect to specific applications. Therefore, a single instruction 15 multiple data (SIMD) structure is added in more and more processors, so as to support same type of operations on a rule dataset. [0003] At present, shuffle instructions are widely introduced in SIMD processors, and different shuffle instructions can meet different requirements. However, in existing technical solutions, when implementing a vector shuffling operation for specific functions, multiple 20 instructions are required to implement a series of operations, an operation method of which is more complex, and execution efficiency of the specific function is reduced. 2 SUMMARY [0004] The present application provides a vector shuffling method, a processor, and an electronic device, so as to solve an issue that multiple instructions are required to implement a series of operations in existing technology, an operation method is more complex, and execution efficiency of specific 5 functions is reduced. [0005] To address the above issue, the present application discloses a vector shuffling method, including: receiving an instruction, the instruction includes: a register identifier and a shuffling parameter; where, the register identifier includes a source register identifier and a destination 10 register identifier; the source register identifier is used to characterize a source register, the source register is a register storing a source element that is operated when a vector shuffling operation is performed; the destination register identifier is used to characterize a destination register, and the destination register is the register storing a target element that obtained after the vector shuffling operation is performed; and the shuffling parameter is used to indicate a 15 parameter according to which the vector shuffling operation is performed on the source element; executing the instruction to perform the vector shuffling operation on the source element obtained from the source register according to the shuffling parameter, and obtaining the target element after performing the vector shuffling operation; writing the target element into the destination register. 20 [0006] To address the above issue, the present application discloses a processor, including: multiple vector registers, where the multiple vector registers include a source register and a destination register, and the source register is configured to store a data element; a decoding unit, configured to decode a vector shuffling instruction; where, the vector shuffling instruction includes: a register identifier and a shuffling parameter, and the 25 register identifier includes a source register identifier and a destination register identifier; an executing unit, configured to perform a vector shuffling operation on a source element obtained from the source register according to the shuffling parameter in response to the vector shuffle instruction, obtain a target element after performing the vector shuffling operation, and write the target element into the destination register. 3 [0007] To address the above issue, the present application discloses an electronic device, including a memory and one or more programs, where the one or more programs are stored in the memory, and configured to enable one or more processors to execute the vector shuffling method as described above. [0008] Compared with existing technology, the present application 5 includes the following advantages: the vector shuffling method, the processor, and the electronic device provided by embodiments of the present application can perform the vector shuffling operation on elements obtained from the source register in combination with the shuffling parameter by adding the 10 register identifier and the shuffling parameter in the instruction. Therefore, the vector shuffling operation for specific functions be implemented through one instruction, without requiring multiple instructions used for performing the shuffling operation to implement the specific function, thereby improving execution efficiency of the specific functi