Search

US-12621126-B2 - SM3 hash algorithm acceleration processors, methods, systems, and instructions

US12621126B2US 12621126 B2US12621126 B2US 12621126B2US-12621126-B2

Abstract

A processor includes a decode unit to decode an SM3 two round state word update instruction. The instruction is to indicate one or more source packed data operands. The source packed data operand(s) are to have eight 32-bit state words A j , B j , C j , D j , E j , F j , G j , and H j that are to correspond to a round (j) of an SM3 hash algorithm. The source packed data operand(s) are also to have a set of messages sufficient to evaluate two rounds of the SM3 hash algorithm. An execution unit coupled with the decode unit is operable, in response to the instruction, to store one or more result packed data operands, in one or more destination storage locations. The result packed data operand(s) are to have at least four two-round updated 32-bit state words A j+2 , B j+2 , E j+2 , and F j+2 , which are to correspond to a round (j+2) of the SM3 hash algorithm.

Inventors

  • Shay Gueron
  • Vlad Krasnov

Assignees

  • INTEL CORPORATION

Dates

Publication Date
20260505
Application Date
20240918

Claims (20)

  1. 1 . A processor comprising: a decode unit to decode a first SM3 message expansion instruction to perform a first part of SM3 message expansion, the first SM3 message expansion instruction to indicate a first operand having a first 32-bit data element in bits [31:0], a second 32-bit data element in bits [63:32], a third 32-bit data element in bits [95:64], and a fourth 32-bit data element in bits [127:96], a second operand having a fifth 32-bit data element in bits [31:0], a sixth 32-bit data element in bits [63:32], a seventh 32-bit data element in bits [95:64], and an eighth 32-bit data element in bits [127:96], and a third operand having a ninth 32-bit data element, a tenth 32-bit data element, and an eleventh 32-bit data element; a first execution unit coupled to the decode unit, the first execution unit to execute the first SM3 message expansion instruction to: generate and store a first result, the first result to have: a first 32-bit result data element in bits [31:0] equal to an evaluation of a permutation function with a first value, the first value equivalent to the first 32-bit data element exclusive OR'd (XOR'd) with the fifth 32-bit data element and XOR'd with the ninth 32-bit data element rotated left by 15 bits, the evaluation of the permutation function with the first value equivalent to the first value XOR'd with the first value rotated left by 15 bits and XOR'd with the first value rotated left by 23 bits; a second 32-bit result data element in bits [63:32] equivalent to an evaluation of a permutation function with a second value, the second value equivalent to the second 32-bit data element XOR'd with the sixth 32-bit data element and XOR'd with the tenth 32-bit data element rotated left by 15 bits, the evaluation of the permutation function with the second value equivalent to the second value XOR'd with the second value rotated left by 15 bits and XOR'd with the second value rotated left by 23 bits; a third 32-bit result data element in bits [95:64] equivalent to an evaluation of a permutation function with a third value, the third value equivalent to the third 32-bit data element XOR'd with the seventh 32-bit data element and XOR'd with the eleventh 32-bit data element rotated left by 15 bits, the evaluation of the permutation function with the third value equivalent to the third value XOR'd with the third value rotated left by 15 bits and XOR'd with the third value rotated left by 23 bits; and a fourth 32-bit result data element in bits [127:96] equivalent to an evaluation of a permutation function with a fourth value, the fourth value evaluated based on at least the fourth 32-bit data element XOR'd with the eighth 32-bit data element, the evaluation of the permutation function with the fourth value equivalent to the fourth value XOR'd with the fourth value rotated left by 15 bits and XOR'd with the fourth value rotated left by 23 bits, wherein the decode unit is also to decode a second SM3 message expansion instruction to perform a second part of the SM3 message expansion, the second SM3 message expansion instruction to indicate the first result, a fourth operand having a twelfth 32-bit data element in bits [31:0], a thirteenth 32-bit data element in bits [63:32], a fourteenth 32-bit data element in bits [95:64], and an fifteenth 32-bit data element in bits [127:96], and a fifth operand having a sixteenth 32-bit data element in bits [31:0], a seventeenth 32-bit data element in bits [63:32], an eighteenth 32-bit data element in bits [95:64], and a nineteenth 32-bit data element in bits [127:96]; and a second execution unit coupled with the decode unit, the second execution unit to execute the second SM3 message expansion instruction to: generate and store a second result, the second result to have: a fifth 32-bit result data element in bits [31:0] equivalent to the twelfth 32-bit data element rotated left by 7 bits and exclusive OR'd (XOR'd) with the sixteenth 32-bit data element and XOR'd with the first 32-bit result data element; a sixth 32-bit result data element in bits [63:32] equivalent to the thirteenth 32-bit data element rotated left by 7 bits and XOR'd with the seventeenth 32-bit data element and XOR'd with the second 32-bit result data element; a seventh 32-bit result data element in bits [95:64] equivalent to the fourteenth 32-bit data element rotated left by 7 bits and XOR'd with the eighteenth 32-bit data element and XOR'd with the third 32-bit result data element; and an eighth 32-bit result data element in bits [127:96] equivalent to the fifteenth 32-bit data element rotated left by 7 bits and XOR'd with the nineteenth 32-bit data element and XOR'd with the fourth 32-bit result data element and XOR'd with a fifth value rotated left by 6 bits and XOR'd with the fifth value rotated left by 15 bits and XOR'd with the fifth value rotated left by 30 bits, wherein the fifth value is evaluated based on at least the twelfth 32-bit data element rotated left by 7 bits and XOR'd with the sixteenth 32-bit data element.
  2. 2 . The processor of claim 1 , wherein the first part of the SM3 message expansion is a first part of generating four SM3 messages for four consecutive rounds.
  3. 3 . The processor of claim 2 , wherein the second part of the SM3 message expansion is a second part of generating four SM3 messages for four consecutive rounds.
  4. 4 . The processor of claim 1 , wherein the decode unit is also to decode a plurality of instructions to accelerate SM3 hash rounds.
  5. 5 . The processor of claim 1 , wherein the processor is a reduced instruction set computing (RISC) processor.
  6. 6 . The processor of claim 1 , further comprising: register renaming logic; and an instruction translation lookaside buffer (TLB).
  7. 7 . The processor of claim 1 , further comprising: a data cache; an instruction cache; and a level 2 (L2) cache coupled to the data cache and coupled to the instruction cache.
  8. 8 . An apparatus comprising: a decode unit to decode an SM3 message expansion instruction, the SM3 message expansion instruction having a first field to specify a first 128-bit SIMD source register, a second field to specify a second 128-bit SIMD source register, and a third field to specify a third 128-bit SIMD source register, the first 128-bit SIMD source register to store a first operand having a first 32-bit data element in bits [31:0], a second 32-bit data element in bits [63:32], a third 32-bit data element in bits [95:64], and a fourth 32-bit data element in bits [127:96], the second 128-bit SIMD source register to store a second operand having a fifth 32-bit data element in bits [31:0], a sixth 32-bit data element in bits [63:32], a seventh 32-bit data element in bits [95:64], and an eighth 32-bit data element in bits [127:96], and the third 128-bit SIMD source register to store a third operand having a ninth 32-bit data element, a tenth 32-bit data element, and an eleventh 32-bit data element; and execution circuitry coupled to the decode unit, the execution circuitry, based on the decode of the SM3 message expansion instruction, to: generate a result including: a first 32-bit result data element in bits [31:0] equal to an evaluation of a function with a first value, the first value equivalent to the first 32-bit data element exclusive OR'd (XOR'd) with the fifth 32-bit data element and XOR'd with the ninth 32-bit data element rotated left by 15 bits, the evaluation of the function with the first value equivalent to the first value XOR'd with the first value rotated left by 15 bits and XOR'd with the first value rotated left by 23 bits; a second 32-bit result data element in bits [63:32] equivalent to an evaluation of a function with a second value, the second value equivalent to the second 32-bit data element XOR'd with the sixth 32-bit data element and XOR'd with the tenth 32-bit data element rotated left by 15 bits, the evaluation of the function with the second value equivalent to the second value XOR'd with the second value rotated left by 15 bits and XOR'd with the second value rotated left by 23 bits; a third 32-bit result data element in bits [95:64] equivalent to an evaluation of a function with a third value, the third value equivalent to the third 32-bit data element XOR'd with the seventh 32-bit data element and XOR'd with the eleventh 32-bit data element rotated left by 15 bits, the evaluation of the function with the third value equivalent to the third value XOR'd with the third value rotated left by 15 bits and XOR'd with the third value rotated left by 23 bits; and a fourth 32-bit result data element in bits [127:96] equivalent to an evaluation of a function with a fourth value, the fourth value evaluated based on at least the fourth 32-bit data element XOR'd with the eighth 32-bit data element, the evaluation of the function with the fourth value equivalent to the fourth value XOR'd with the fourth value rotated left by 15 bits and XOR'd with the fourth value rotated left by 23 bits; and store the result in a destination.
  9. 9 . The apparatus of claim 8 , wherein SM3 message expansion instruction when executed is to cause the execution circuitry to perform a first part of an SM3 message expansion.
  10. 10 . The apparatus of claim 9 , wherein the first part of the SM3 message expansion is a first part of generating four SM3 messages for four consecutive rounds.
  11. 11 . The apparatus of claim 8 , wherein the decode unit is also to decode a plurality of instructions to accelerate SM3 hash rounds.
  12. 12 . The apparatus of claim 8 , wherein the destination is one of the first, second, and third 128-bit SIMD source registers.
  13. 13 . The apparatus of claim 8 , wherein the apparatus is a processor, and wherein the processor is a reduced instruction set computing (RISC) processor.
  14. 14 . The apparatus of claim 8 , further comprising: register renaming logic; and an instruction translation lookaside buffer (TLB).
  15. 15 . An apparatus comprising: a decode unit to decode an SM3 message expansion instruction, the SM3 message expansion instruction having a first field to specify a first 128-bit SIMD source register, a second field to specify a second 128-bit SIMD source register, and a third field to specify a third 128-bit SIMD source register, the first 128-bit SIMD source register to store a first operand having a first 32-bit data element in bits [31:0], a second 32-bit data element in bits [63:32], a third 32-bit data element in bits [95:64], and a fourth 32-bit data element in bits [127:96], the second 128-bit SIMD source register to store a second operand having a fifth 32-bit data element in bits [31:0], a sixth 32-bit data element in bits [63:32], a seventh 32-bit data element in bits [95:64], and an eighth 32-bit data element in bits [127:96], and the third 128-bit SIMD source register to store a third operand having a ninth 32-bit data element in bits [31:0], a tenth 32-bit data element in bits [63:32], an eleventh 32-bit data element in bits [95:64], and a twelfth 32-bit data element in bits [127:96]; and execution circuitry coupled to the decode unit, the execution circuitry, based on the decode of the SM3 message expansion instruction, to: generate a result including: a first 32-bit result data element in bits [31:0] equivalent to the fifth 32-bit data element rotated left by 7 bits and exclusive OR'd (XOR'd) with the ninth 32-bit data element and XOR'd with the first 32-bit data element; a second 32-bit result data element in bits [63:32] equivalent to the sixth 32-bit data element rotated left by 7 bits and XOR'd with the tenth 32-bit data element and XOR'd with the second 32-bit data element; a third 32-bit result data element in bits [95:64] equivalent to the seventh 32-bit data element rotated left by 7 bits and XOR'd with the eleventh 32-bit data element and XOR'd with the third 32-bit data element; and a fourth 32-bit result data element in bits [127:96] equivalent to the eighth 32-bit data element rotated left by 7 bits and XOR'd with the twelfth 32-bit data element and XOR'd with the fourth 32-bit data element and XOR'd with a value rotated left by 6 bits and XOR'd with the value rotated left by 15 bits and XOR'd with the value rotated left by 30 bits, wherein the value is evaluated based on at least the fifth 32-bit data element rotated left by 7 bits and XOR'd with the ninth 32-bit data element; and store the result in a destination.
  16. 16 . The apparatus of claim 15 , wherein SM3 message expansion instruction when executed is to cause the execution circuitry to perform a second part of an SM3 message expansion, wherein the second part is subsequent to a first part of the SM3 message expansion.
  17. 17 . The apparatus of claim 16 , wherein the second part of the SM3 message expansion is a second part of generating four SM3 messages for four consecutive rounds.
  18. 18 . The apparatus of claim 15 , wherein the decode unit is also to decode a plurality of instructions to accelerate SM3 hash rounds.
  19. 19 . The apparatus of claim 15 , wherein the destination is the first 128-bit SIMD source register.
  20. 20 . The apparatus of claim 15 , wherein the apparatus is a processor, and wherein the processor is a reduced instruction set computing (RISC) processor.

Description

RELATED APPLICATIONS The present application is a continuation of U.S. patent application Ser. No. 17/480,117, filed on Sep. 20, 2021, entitled “SM3 HASH ALGORITHM ACCELERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS”, which is a continuation of U.S. patent application Ser. No. 17/092,133, filed on Nov. 6, 2020, entitled “SM3 HASH ALGORITHM ACCELERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS”, now patent as U.S. Pat. No. 11,128,443, issued on Sep. 21, 2021, which is a continuation of U.S. patent application Ser. No. 16/847,626, filed on Apr. 13, 2020, entitled “SM3 HASH ALGORITHM ACCELERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS”, now patented as U.S. Pat. No. 11,075,746, issued on Jul. 27, 2021, which is a continuation of U.S. patent application Ser. No. 15/973,015, filed on May 7, 2018, entitled “SM3 HASH ALGORITHM ACCELERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS”, now patented as U.S. Pat. No. 10,623,175, issued on Apr. 14, 2020, which is a continuation of U.S. patent application Ser. No. 15/132,208, filed on Apr. 18, 2016, entitled “SM3 HASH ALGORITHM ACCELERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS”, now patented as U.S. Pat. No. 9,979,538, issued on May 22, 2018, which is a continuation of U.S. patent application Ser. No. 14/477,552, filed on Sep. 4, 2014, entitled “SM3 HASH ALGORITHM ACCELERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS”, now patented as U.S. Pat. No. 9,317,719, issued on Apr. 19, 2016, which is hereby incorporated herein by reference in its entirety and for all purposes. BACKGROUND Technical Field Embodiments described herein relate to processors. In particular, embodiments described herein relate to the evaluation of hash algorithms with processors. Background Information Hash functions or algorithms are a type of cryptographic algorithm that are widely used in computer systems and other electronic devices. The hash algorithms generally take a message as an input, generate a corresponding hash value or digest by applying the hash function to the message, and output the hash value or digest. Typically, the same hash value should be generated if the same hash function is evaluated with the same message. Such hash algorithms are used for various purposes, such as for verification (e.g., verifying the integrity of files, data, or messages), identification (e.g., identifying files, data, or messages), authentication (e.g., generating message authentication codes), generating digital signatures, generating pseudorandom numbers, and the like. As one illustrative example, a hash function may be used to generate a hash value for a given message. At a later time, a hash value may be recomputed for the given message using the same hash function. If the hash values are identical, then it can be assumed that the message hasn't been changed. In contrast, if the hash values are different, then it can be assumed that the message has been changed. One known type of hashing algorithm is the SM3 hash function. The SM3 hash algorithm has been published by the Chinese Commercial Cryptography Association Office and approved by the Chinese government. The SM3 hash algorithm has been specified as the hashing algorithm for the TCM (Trusted Computing Module) by the China Information Security Standardization Technical Committee (TC260) initiative. An English language description of the SM3 hash function has been published as the Internet Engineering Task Force (IETF) Internet-Draft entitled “SM3 Hash Function,” by S. Shen and X. Lee, on Oct. 24, 2011. BRIEF DESCRIPTION OF THE DRAWINGS The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments. In the drawings: FIG. 1 is a block diagram of an instruction set of a processor that includes one or more SM3 hash algorithm acceleration instructions. FIG. 2 illustrates the compression function of the SM3 hash algorithm. FIG. 3 is a block diagram of an embodiment of a processor that is operable to perform an embodiment of an SM3 two round at least four (or in some embodiments eight) state word update instruction. FIG. 4 is a block flow diagram of an embodiment of a method of performing an embodiment of an SM3 two round at least four (or in some embodiments eight) state word update instruction. FIG. 5 is a block diagram illustrating an embodiment of an SM3 two round eight state word update operation. FIG. 6 is a block diagram illustrating an embodiment of an SM3 two round four remaining state word update operation. FIG. 7 is a block diagram illustrating an embodiment of an SM3 four message expansion initiation operation. FIG. 8 is a block diagram illustrating an embodiment of an SM3 four message expansion completion operation. FIG. 9A is a block diagram illustrating an embodiment of an in-order pipeline and an embodiment of a register renaming out-of-order issue/execution pipeline. FIG. 9B is a block diagram of an embodiment of pro