US-12619435-B2 - Virtual idle loops
Abstract
Techniques relating to virtual idle loops are described. In an embodiment, decoder circuitry decodes a single instruction. The single instruction includes a field for an identifier of a first source operand, a field for an identifier of a second source operand, a field for an identifier of a destination operand, and a field for an opcode. Execution circuitry executes the decoded instruction according to the opcode to: write the first source operand to a memory location identified by the second source operand; compute an index into a control array based at least in part on the destination operand; and determine whether to exit to a hypervisor of a Virtual Machine (VM) based at least in part on data stored at a location in the control array, wherein the location is to be identified by the computed index. Other embodiments are also disclosed and claimed.
Inventors
- Andreas Kleen
- Jason W. Brandt
- Gilbert Neiger
- Ittai Anati
Assignees
- INTEL CORPORATION
Dates
- Publication Date
- 20260505
- Application Date
- 20220927
Claims (20)
- 1 . An apparatus comprising: decoder circuitry to decode a single instruction, the single instruction to include a field for an identifier of a first source operand, a field for an identifier of a second source operand, a field for an identifier of a destination operand, and a field for an opcode, and execution circuitry to execute the decoded instruction according to the opcode to: write the first source operand to a memory location identified by the second source operand; compute an index into a control array based at least in part on the destination operand and an element size of the control array; and determine whether to exit to a hypervisor of a Virtual Machine (VM) based at least in part on data stored at a location in the control array, wherein the location is to be identified by the computed index.
- 2 . The apparatus of claim 1 , wherein the field for the identifier of the first source operand is to identify a value.
- 3 . The apparatus of claim 1 , wherein the field for the identifier of the second source operand is to identify a memory address.
- 4 . The apparatus of claim 1 , wherein a first Virtual Central Processing Unit (VCPU) is to execute a first instruction to enable execution of the single instruction and cause the first VCPU to enter a low power consumption state.
- 5 . The apparatus of claim 4 , wherein the first instruction comprises a Monitor Wait (MWAIT) instruction.
- 6 . The apparatus of claim 4 , wherein a second VCPU is to execute the single instruction in response to a determination that execution of the single instruction is enabled.
- 7 . The apparatus of claim 6 , wherein the second VCPU is to cause the first VCPU to exit the low power consumption state in response to detection of an exit event.
- 8 . The apparatus of claim 1 , wherein the field for the identifier of the destination operand is to identify a target processor core.
- 9 . The apparatus of claim 1 , wherein the field for the identifier of the destination operand is to identify an Advanced Programmable Interrupt Controller (APIC) identifier of a target processor core.
- 10 . The apparatus of claim 1 , wherein a VM Control Structure (VMCS) is to store one or more of: a pointer to the control array, a control array limit, a control bit to indicate whether execution of the single instruction is enabled, and a timeout configuration setting.
- 11 . The apparatus of claim 10 , wherein a first VCPU is to execute a first instruction to enable execution of the single instruction and cause the first VCPU to enter a low power consumption state, wherein the timeout configuration setting is to indicate a timeout period for the first VCPU to exit the low power consumption state.
- 12 . A processor comprising: decoder circuitry to decode a single instruction, the single instruction to include a first field for a value, a second field for a memory address, a third field for an identifier of a target processor core, and a fourth field for an opcode, and execution circuitry to execute the decoded instruction according to the opcode to: write the value to a memory location identified by the memory address; compute an index into a control array based at least in part on the identifier and an element size of the control array; and determine whether to exit to a hypervisor of a Virtual Machine (VM) based at least in part on data stored at a location in the control array, wherein the location is to be identified by the computed index.
- 13 . The processor of claim 12 , wherein a first Virtual Central Processing Unit (VCPU) is to execute a first instruction to enable execution of the single instruction and cause the first VCPU to enter a low power consumption state.
- 14 . The processor of claim 13 , wherein the first instruction comprises a Monitor Wait (MWAIT) instruction.
- 15 . The processor of claim 13 , wherein a second VCPU is to execute the single instruction in response to a determination that execution of the single instruction is enabled.
- 16 . The processor of claim 15 , wherein the second VCPU is to cause the first VCPU to exit the low power consumption state in response to detection of an exit event.
- 17 . The processor of claim 13 , wherein the identifier is to identify an Advanced Programmable Interrupt Controller (APIC) identifier of the target processor core.
- 18 . The processor of claim 13 , wherein a VM Control Structure (VMCS) is to store one or more of: a pointer to the control array, a control array limit, a control bit to indicate whether execution of the single instruction is enabled, and a timeout configuration setting.
- 19 . The processor of claim 18 , wherein a first VCPU is to execute a first instruction to enable execution of the single instruction and cause the first VCPU to enter a low power consumption state, wherein the timeout configuration setting is to indicate a timeout period for the first VCPU to exit the low power consumption state.
- 20 . One or more non-transitory computer-readable media comprising one or more instructions that when executed on a processor configure the processor to perform one or more operations to: decode a single instruction, the single instruction to include a field for an identifier of a first source operand, a field for an identifier of a second source operand, a field for an identifier of a destination operand, and a field for an opcode, and execute the decoded instruction according to the opcode to: write the first source operand to a memory location identified by the second source operand; compute an index into a control array based at least in part on the destination operand and an element size of the control array; and determine whether to exit to a hypervisor of a Virtual Machine (VM) based at least in part on data stored at a location in the control array, wherein the location is to be identified by the computed index.
Description
FIELD The present disclosure generally relates to the field of computing. More particularly, some embodiments relate to techniques to implement virtual idle loops. BACKGROUND Modern computing workloads often have short idle periods, for example, to wait for a reply from a network or a storage disk. In a virtualized environment example, a hypervisor (monitoring a Virtual Machine (VM)) may intercept a Halt (HLT) idle instruction to put the VM to sleep and allow other VMs to run to increase the utilization of a server. This can result in a round trip through the hypervisor and host idle loop or host scheduler. This may, however, take longer than an expected timeout for a target short idle period. And even if it does not take longer, it may still increase the critical time from a wakeup event occurrence to code execution in a guest VM, which increases latency. BRIEF DESCRIPTION OF THE DRAWINGS The detailed description is provided with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items. FIG. 1 illustrates examples of computing hardware to process an instruction. FIG. 2 illustrates an example method performed by a processor to process an instruction. FIG. 3 illustrates a block diagram of a virtual machine control structure, according to an embodiment. FIG. 4 illustrates a flow diagram of a method to improve the performance of virtual idle loops, according to an embodiment. FIG. 5 illustrates an example computing system. FIG. 6 illustrates a block diagram of an example processor and/or System on a Chip (SoC) that may have one or more cores and an integrated memory controller. FIG. 7(A) is a block diagram illustrating both an example in-order pipeline and an example register renaming, out-of-order issue/execution pipeline according to examples. FIG. 7(B) is a block diagram illustrating both an example in-order architecture core and an example register renaming, out-of-order issue/execution architecture core to be included in a processor according to examples. FIG. 8 illustrates examples of execution unit(s) circuitry. FIG. 9 illustrates examples of an instruction format. FIG. 10 illustrates examples of an addressing information field. DETAILED DESCRIPTION In the following description, numerous specific details are set forth in order to provide a thorough understanding of various embodiments. However, various embodiments may be practiced without the specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to obscure the particular embodiments. Further, various aspects of embodiments may be performed using various means, such as integrated semiconductor circuits (“hardware”), computer-readable instructions organized into one or more programs (“software”), or some combination of hardware and software. For the purposes of this disclosure reference to “logic” shall mean either hardware (such as logic circuitry or more generally circuitry or circuit), software, firmware, or some combination thereof. As mentioned above, in a virtualized environment, a hypervisor (monitoring a VM) may intercept an HLT idle instruction to put the VM to sleep, resulting in a round trip through the hypervisor and host idle loop. This may, however, take up a significant portion of a short idle period and in some cases take longer than an expected timeout, and may be avoided by polling for the idle wake up event for some limited time. Such an approach may require special paravirtualization to achieve in some current implementations. Also, the HLT instruction requires an interrupt if one Central Processing Unit (CPU) or processor in a VM wants to wake up another CPU to execute work. The interrupt typically also needs a round trip through the host. On non-virtualized systems it is possible to use a MONITOR WAIT (MWAIT) instruction which uses special hardware to monitor a memory location for a change, in addition for waiting for any other events that may interrupt the idle loop. Another CPU may wake up the idle CPU by writing to the memory location without the overhead of an interrupt. Today this cannot be efficiently implemented virtually, in part, because it is too expensive (in terms of delay, bandwidth use, and/or resource utilization) and may also be difficult to track the memory write operation from the hypervisor while a target CPU is sleeping. Instead of this tracking, virtualized guest VMs may use the older HLT instruction to put themselves to sleep. To this end, some embodiments provide techniques to implement virtual idle loops. In an embodiment, a new Monitor Trigger (MTRIGGER) instruction is provided which allows for integration of idle polling into the virtualization architecture, e.g., making it available to all guest VMs. In one embodiment, th