CN-122029515-A - Atomic update instruction with bitmask

CN122029515ACN 122029515 ACN122029515 ACN 122029515ACN-122029515-A

Abstract

A masked atomic update instruction is described that atomically performs compare and swap operations on selected bits of a data structure. Executing the masked atomic update instruction compares the respective source value with the respective value of the bit stored at the destination data storage location. If the respective bit values match, one or more of the respective bit values at the destination are replaced with one or more defined replacement values. Alternatively, if the corresponding bit values do not match, the destination is not modified. The masked atomic update instruction enables the processing unit to mask out bits of the destination data storage location that are not involved in the comparison or update. Thus, the masked atomic update instruction provides a bit level granularity by which another thread is prevented from accessing the bits of the destination data storage location. This bit-level granularity advantageously allows multiple threads to access a common data storage location simultaneously.

Inventors

RESHMA LAL
David A. Kaplan
Yelena Ilik
Jeremy Wayne Powell

Assignees

超威半导体公司

Dates

Publication Date: 20260512
Application Date: 20241024
Priority Date: 20231024

Claims (15)

1. A system, the system comprising: A circuit board having a memory mounted to the circuit board, and A processing unit configured to: obtaining a data structure comprising a plurality of fields from the memory; comparing a subset of the plurality of fields to at least one source value specified by the masked atomic update instruction, and In response to the subset of the plurality of fields matching the at least one source value, the data structure is modified using at least one replacement value specified by the masked atomic update instruction.
2. The system of claim 1, wherein the processing unit is configured to obtain the data structure by: Transmitting a memory request for data stored in the memory to the circuit board, and In response to sending the memory request, the data structure is returned to the processing unit.
3. The system of claim 2, wherein in response to sending the memory request, the processing unit is configured to return the subset of the plurality of fields to the processing unit.
4. The system of any of claims 1-3, wherein the plurality of fields are configured to store a first number of bits, and wherein the subset of the plurality of fields are configured to store a second number of bits that is less than the first number of bits.
5. The system of any of claims 1 to 4, wherein the subset of the plurality of fields is defined by a mask describing a location of each field of the subset of the plurality of fields relative to the plurality of fields of the data structure.
6. The system of any one of claims 1 to 5, wherein the processing unit is further configured to allow a different processing unit to access fields from the data structure that are excluded from the subset of the plurality of fields during the comparing of the subset of the plurality of fields with the at least one source value and during the modifying of the data structure using the at least one replacement value.
7. The system of any of claims 1 to 6, wherein the processing unit is further configured to allow a different processing unit to access the plurality of fields of the data structure in response to modifying the data structure with the at least one replacement value.
8. The system of any of claims 1-7, wherein modifying the data structure using the at least one replacement value comprises modifying at least one field of the subset of the plurality of fields.
9. The system of any of claims 1-8, wherein modifying the data structure using the at least one replacement value is performed independent of modifying a field of the data structure that is not included in the subset of the plurality of fields.
10. The system of any of claims 1 to 9, wherein the processing unit is further configured to not modify the data structure in response to the plurality of fields not matching the at least one source value.
11. The system of any of claims 1 to 10, wherein the at least one source value comprises a plurality of source values including a corresponding source value for each data field in the subset of the plurality of fields.
12. The system of any of claims 1 to 11, wherein the processing unit is configured to compare the subset of the plurality of fields and modify the data structure independent of a lock on the data structure.
13. The system of any of claims 1 to 12, wherein the processing unit is configured to obtain the data structure, compare the subset of the plurality of fields to the at least one source value, and modify the data structure using the at least one replacement value as part of executing the masked atomic update instruction, wherein the masked atomic update instruction specifies: the at least one source value; The at least one replacement value; a destination address of the data structure, and A mask defining the subset of the plurality of fields.
14. A system, the system comprising: a circuit board having a memory mounted to the circuit board; A first processing unit; a second processing unit configured to execute masked atomic update instructions by: comparing a subset of the plurality of fields of the data structure with at least one source value specified by the masked atomic update instruction, and Modifying the data structure using at least one replacement value specified by the masked atomic update instruction in response to the subset of the plurality of fields matching the at least one source value, or In response to the subset of the plurality of fields not matching the at least one source value, not modifying the data structure, and The second processing unit is further configured to prevent the first processing unit from modifying the subset of the plurality of fields of the data structure when executing the masked atomic update instruction.
15. An apparatus, the apparatus comprising: A processing unit configured to: Receiving a masked atomic update instruction, the masked atomic update instruction including information describing: An address of a data storage location; at least one source value; at least one replacement value, and Mask and The masked atomic update instruction is executed by: Identifying a subset of a plurality of data fields from a data structure associated with the data storage location using the mask; Comparing the subset of the plurality of data fields to the at least one source value, and Modifying the data structure using the at least one replacement value in response to the subset of the plurality of data fields matching the at least one source value, or In response to the subset of the plurality of data fields not matching the at least one source value, the data structure is not modified.

Description

Atomic update instruction with bitmask Priority The present application claims priority from U.S. provisional patent application No. 63/592,916, filed on 10/24 of 2023, the disclosure of which is hereby incorporated by reference in its entirety. Background Computing systems often share data resources, such as in system architectures where different virtual machines share a common memory location, in multithreaded programming approaches where multiple threads utilize shared data addresses, and so forth. However, sharing data resources presents challenges such as synchronizing, ensuring data integrity, and avoiding race conditions to ensure that the computing system is operating as intended. Appropriate synchronization techniques such as atomic operations ensure that only a single entity (e.g., a single virtual machine or a single thread) accesses critical data at a time, thereby maintaining data integrity and preventing race conditions. Drawings FIG. 1 is a block diagram of a non-limiting example system having a hardware platform operable to execute masked atomic update instructions using the techniques described herein. FIG. 2 is a block diagram of a non-limiting example system showing in more detail that the hardware platform of FIG. 1 is executing masked atomic update instructions. FIG. 3 is a block diagram depicting a non-limiting example process of performance of executing a masked atomic update instruction. FIG. 4 is a block diagram depicting a non-limiting example process of allowing different processing units to access data stored at a common destination address when one processing unit executes a masked atomic update instruction. Detailed Description Computing systems often share data resources to enhance performance and efficiency. One common example of sharing data resources is when computing systems implement different threads. A thread is a smaller unit of process that is designed to run concurrently with other threads (e.g., other threads within the same process) while sharing the same memory space (e.g., sharing data stored at a common memory address). Data resource sharing is particularly useful in virtual machine implementations that inherently face virtualized resource constraints, making sharing data resources between threads critical to maximize utilization of system resources and to increase the overall speed and responsiveness of computing tasks running in a virtualized environment. In virtual machine implementations, running multiple virtual machines with a single hardware device improves resource efficiency and reduces computing costs. By sharing data resources among different threads within each virtual machine, the overall performance and responsiveness of the computing task is improved. Sharing data resources allows a single physical hardware platform to host multiple virtual environments (e.g., each virtual environment having its own isolated operating system and computing tasks) while efficiently sharing underlying hardware resources such as processing units, memory, and data storage. Running multiple virtual machines on a single hardware platform optimizes resource utilization, reduces the need for additional hardware platforms, and thus reduces overall computing costs (e.g., reduces power consumption). Such techniques are particularly advantageous in data centers and cloud computing environments where scalability and efficient use of resources are critical. In this context, efficient data sharing between threads ensures that each virtual machine operates smoothly, providing reliable and high performance services, without the overhead of managing multiple physical hardware platforms. However, sharing data resources presents technical challenges, such as when different threads compete for access to a common data storage location. As one example challenge, synchronization problems arise because concurrent access to shared data resources (e.g., by different virtual machines, by different processing units, combinations thereof, etc.) may result in race conditions, where the outcome of a computing task depends on the order in which threads execute, may cause data corruption or inconsistency when one thread is interrupted by another thread. To address these technical challenges, some conventional approaches implement locks on data resources such that one entity (e.g., virtual machine, processing unit, etc.) accessing a data storage location prevents a different entity from accessing the data storage location. However, implementing locks to manage access to shared data storage locations introduces computational overhead and creates deadlock conditions in which threads wait indefinitely for data resource access. Ensuring atomicity of computing operations is critical to preventing partial data modifications (e.g., partial updates) that place data in an inconsistent state. Additionally, balancing performance and computing resource contention becomes challenging because excessive locks can