Search

CN-113227969-B - Speculative instruction wakeup for tolerating drain latency of memory violation ordering check buffer

CN113227969BCN 113227969 BCN113227969 BCN 113227969BCN-113227969-B

Abstract

One technique for speculatively executing load-dependent instructions includes detecting that a memory ordering coherency queue is full for a completed load instruction. The techniques also include storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering coherency queue is full. The techniques also include speculatively executing instructions that depend on the completed load instruction. The technique also includes replaying the load instruction in response to a time slot becoming available in the memory ordering coherency queue. The technique also includes, in response to receiving load data for the replay load instruction, testing for mis-speculation of data by comparing the load data for the replay load instruction with the data stored in the storage location that was loaded by the completed load instruction.

Inventors

  • KALAMATIANOS JOHN
  • Zhi Xiajin
  • Krisnan V. Lamani
  • Scott Thomas Bingham

Assignees

  • 超威半导体公司
  • 超威半导体公司

Dates

Publication Date
20260421
Application Date
20200327
Priority Date
20191031

Claims (20)

  1. 1. A method for speculatively executing a load-dependent instruction, comprising: detecting that the memory ordering coherency queue is full for the completed load instruction; Storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering consistency queue is full; speculatively executing instructions dependent on the completed load instruction; Replaying the load instruction in response to a time slot becoming available in the memory ordering coherency queue, and In response to receiving load data of the load instruction for replay, test for misprediction of data by comparing the load data of the load instruction for replay with the data stored in the storage location that was loaded by the completed load instruction.
  2. 2. The method of claim 1, further comprising: In response to determining that the load data of the load instruction for replay is the same as the data loaded by the completed load instruction, it is determined that no violation of the memory ordering rule occurred.
  3. 3. The method of claim 2, further comprising: Removing the load instruction from the load queue; Storing the load instruction in the memory ordering coherency queue, and For the load instruction in the memory ordering consistency queue, testing whether the memory ordering consistency rule is violated.
  4. 4. The method of claim 1, further comprising: In response to determining that the load data of the load instruction for replay is not the same as the data loaded by the completed load instruction, it is determined that violation of the memory ordering consistency semantic occurred.
  5. 5. The method of claim 4, further comprising: In response to the violation occurring, the load instruction is flushed and replayed.
  6. 6. The method of claim 1, wherein the instruction dependent on the completed load instruction comprises an instruction that consumes the data loaded by the completed load instruction.
  7. 7. The method of claim 1, wherein the memory ordering rules comprise ordering rules loaded to a load.
  8. 8. The method of claim 1, wherein the memory ordering rules comprise store-to-load ordering rules.
  9. 9. The method of claim 1, wherein: The memory location includes one of a register in a register file that is a destination register of the completed load instruction and a memory dedicated to storing load data to test for load dependency instructions whether memory ordering coherency semantics are violated.
  10. 10. A processor subsystem for speculatively executing a load-dependent instruction, the processor subsystem comprising: Memory rank consistency queue, and A load/store unit configured to: detecting that the memory ordering coherency queue is full for the completed load instruction; Storing data loaded by the completed load instruction into a storage location for storing data if the memory ordering coherency queue is full; Allowing speculative execution of instructions that depend on the completed load instruction; Replaying the load instruction in response to a time slot becoming available in the memory ordering coherency queue, and In response to receiving load data of the load instruction for replay, test for misprediction of data by comparing the load data of the load instruction for replay with the data stored in the storage location that was loaded by the completed load instruction.
  11. 11. The processor subsystem of claim 10, wherein the load/store unit is further configured to: In response to determining that the load data of the load instruction for replay is the same as the data loaded by the completed load instruction, it is determined that no violation of the memory ordering rule occurred.
  12. 12. The processor subsystem of claim 11, wherein the load/store unit is further configured to: Removing the load instruction from the load queue; Storing the load instruction in the memory ordering coherency queue, and For the load instruction in the memory ordering consistency queue, testing whether the memory ordering consistency rule is violated.
  13. 13. The processor subsystem of claim 10, wherein the load/store unit is further configured to: in response to determining that the load data of the load instruction for replay is not the same as the data loaded by the completed load instruction, the memory ordering consistency semantics are determined to be violated.
  14. 14. The processor subsystem of claim 13, wherein the load/store unit is further configured to: In response to the violation occurring, the load instruction is flushed and replayed.
  15. 15. The processor subsystem of claim 10 wherein instructions dependent upon the completed load instruction comprise instructions consuming the data loaded by the completed load instruction.
  16. 16. The processor subsystem of claim 10 wherein the memory ordering rules comprise ordering rules loaded to a load.
  17. 17. The processor subsystem of claim 10 wherein the memory ordering rules comprise ordering rules stored to a load.
  18. 18. The processor subsystem of claim 10, wherein: The memory location includes one of a register in a register file that is a destination register of the completed load instruction and a memory dedicated to storing load data to test for load dependency instructions whether memory ordering coherency semantics are violated.
  19. 19. A processor for speculatively executing a load-dependent instruction, the processor comprising: A memory ordering consistency queue; a load/store unit configured to: detecting that the memory ordering coherency queue is full for the completed load instruction; Storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering consistency queue is full; Allowing speculative execution of instructions that depend on the completed load instruction; Replaying the load instruction in response to a time slot becoming available in the memory ordering coherency queue, and In response to receiving load data of the load instruction for replay, testing for misprediction of data by comparing the load data of the load instruction for replay with the data stored in the storage location loaded by the completed load instruction, and One or more of the functional units, the one or more functional units are configured to speculatively execute the instructions dependent on the completed load instruction.
  20. 20. The processor of claim 19, wherein the memory ordering rules include one of a load-to-load ordering rule and a store-to-load ordering rule.

Description

Speculative instruction wakeup for tolerating drain latency of memory violation ordering check buffer Cross Reference to Related Applications The present application claims the benefit of U.S. non-provisional application Ser. No. 16/671,097 filed on day 31 of 10 in 2019 and U.S. provisional patent application Ser. No. 62/828,861 filed on day 4 in 2019, the contents of which are incorporated herein by reference. Statement regarding government interest The present invention was completed with government support under item PathForward of the national security agency of Lorentz Lefromo, U.S. granted by DOE (major contract number DE-AC52-07NA27344, subcontract number B620717). The government has certain rights in this invention. Background Out-of-order processors execute instructions out-of-order, but adhere to certain constraints to ensure that execution occurs in a programmatically specified manner. One class of constraints involves ensuring that certain memory ordering semantics are followed. Constraints related to memory ordering semantics may be relaxed to improve performance, but additional steps may also need to be taken to ensure correctness of execution. Drawings A more detailed understanding can be obtained from the following description, given by way of example in connection with the accompanying drawings, in which: FIG. 1 is a block diagram of an exemplary apparatus in which one or more disclosed embodiments may be implemented; FIG. 2 is a block diagram of an instruction execution pipeline within the processor of FIG. 1, according to one example; FIG. 3A illustrates exemplary operations for triggering speculative execution of instructions that depend on load instructions that qualify to be placed in a full memory consistency ordering queue; FIG. 3B illustrates exemplary operations for such a queue previously full load instruction in response to a slot becoming free in a memory ordering coherency queue, and FIG. 4 is a flow diagram of a method for speculatively executing instructions dependent on a load instruction having a full memory ordering coherency queue according to one example. Detailed Description Techniques for speculatively executing load-dependent instructions are provided. The techniques include detecting that a memory ordering coherency queue is full for a completed load instruction. The techniques also include storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering coherency queue is full. The techniques also include speculatively executing instructions that depend on the completed load instruction. The technique also includes replaying the load instruction in response to a time slot becoming available in the memory ordering coherency queue. The technique also includes, in response to receiving load data for the replay load instruction, testing for mis-speculation of data by comparing the load data for the replay load instruction with the data stored in the storage location that was loaded by the completed load instruction. A processor subsystem for speculatively executing load-dependent instructions is provided. The processor subsystem includes a memory ordering coherency queue and a load/store unit. The load/store unit detects that a memory ordering coherency queue is full for a completed load instruction, writes data loaded by the completed load instruction into a storage location for storing data when the memory ordering coherency queue is full, allows instructions dependent on the completed load instruction to speculatively execute, replays the load instruction in response to a time slot becoming available in the memory ordering coherency queue, and tests for misspeculation of data by comparing the load data for the replayed load instruction with the data loaded by the completed load instruction stored in the storage location in response to receiving load data for the replayed load instruction. A processor for speculatively executing load-dependent instructions is provided. The processor includes a memory ordering coherency queue, a load/store unit, and one or more functional units. The load/store unit detects that a memory ordering coherency queue is full for a completed load instruction, writes data loaded by the completed load instruction into a storage location for storing data when the memory ordering coherency queue is full, allows instructions dependent on the completed load instruction to speculatively execute, replays the load instruction in response to a time slot becoming available in the memory ordering coherency queue, and tests for misspeculation of data by comparing the load data for the replayed load instruction with the data loaded by the completed load instruction stored in the storage location in response to receiving load data for the replayed load instruction. The one or more functional units speculatively execute the instructions that depend on the completed load instruction. Fig. 1 is a block dia