US-20260126482-A1 - Dynamic Performance Rate Limiter for Integrated Circuit Device
Abstract
Integrated circuit devices, methods, and circuitry for dynamically limiting a rate of performance of an integrated circuit device is provided. This may allow an integrated circuit to remain within performance limits, such as those found in export controls. An integrated circuit device may include data utilization circuitry to perform arithmetic operations and a performance monitor circuit. The performance monitor circuit may selectively throttle the data utilization circuitry to maintain a performance rate of the data utilization circuitry to within a maximum average limit over an accumulation window of a leaky accumulator circuit.
Inventors
- Gregg W. Baeckler
- Martin Langhammer
Assignees
- Gregg W. Baeckler
- Martin Langhammer
Dates
- Publication Date
- 20260507
- Application Date
- 20251231
Claims (20)
- 1 . An integrated circuit device comprising: data utilization circuitry to perform arithmetic operations; and a performance monitor circuit to selectively throttle the data utilization circuitry to maintain a performance rate of the data utilization circuitry to within a maximum average limit over an accumulation window of a leaky accumulator circuit.
- 2 . The integrated circuit device of claim 1 , wherein the data utilization circuitry comprises a central processing unit (CPU) processor core, a graphics processing unit (GPU) processor core, a digital signal processing (DSP) block, programmable logic circuitry programmed with a system design, or any combination thereof.
- 3 . The integrated circuit device of claim 1 , wherein the data utilization circuitry comprises a first data utilization circuit and a second data utilization circuit, wherein the performance monitor is to selectively throttle both the first data utilization circuit and the second data utilization circuit.
- 4 . The integrated circuit device of claim 3 , wherein the leaky accumulator circuit is to accumulate a first performance rate of the first data utilization circuit and a second performance rate of the second data utilization circuit over the accumulation window.
- 5 . The integrated circuit device of claim 3 , wherein the performance monitor comprises: the leaky accumulator circuit, wherein the leaky accumulator circuit is to accumulate a first performance rate of the first data utilization circuit over the accumulation window; an additional leaky accumulator circuit, wherein the additional accumulator circuit is to accumulate a second performance rate of the first data utilization circuit over the accumulation window; and a summation circuit to sum the accumulated values from the leaky accumulator circuit and the additional leaky accumulator circuit.
- 6 . The integrated circuit device of claim 1 , comprising an additional performance monitor circuit, wherein the data utilization circuitry comprises a first data utilization circuit and a second data utilization circuit, wherein the performance monitor circuit is to selectively throttle the first data utilization circuit and wherein the additional performance monitor circuit is to selectively throttle the second data utilization circuit.
- 7 . The integrated circuit device of claim 1 , wherein the performance monitor circuit is to selectively throttle the data utilization circuitry based on a check clock that is slower than a compute clock used by the data utilization circuitry.
- 8 . The integrated circuit device of claim 7 , wherein the accumulation window of the leaky accumulator circuit is based on the check clock.
- 9 . The integrated circuit device of claim 8 , wherein the accumulation window of the leaky accumulator circuit comprises a plurality of check clock cycles corresponding to one second.
- 10 . The integrated circuit device of claim 8 , wherein the accumulation window of the leaky accumulator circuit comprises a plurality of check clock cycles corresponding to multiple seconds.
- 11 . The integrated circuit device of claim 8 , wherein the accumulation window of the leaky accumulator circuit comprises a single check clock cycle.
- 12 . The integrated circuit device of claim 1 , wherein the performance monitor circuit is to selectively throttle the data utilization circuitry based on temporarily freezing a compute clock of the data utilization circuitry or temporarily freezing an instruction pipeline of the data utilization circuitry, or some combination thereof.
- 13 . A method for dynamic performance rate limiting of an integrated circuit device, the method comprising: determining a cost per operation per compute clock cycle of the integrated circuit device; maintaining a count of the total cost; synchronizing the total cost to a trusted clock signal that is slower than, and not dependent on, the compute clock; accumulating a value corresponding to the total cost in a leaky accumulator that gradually decreases according to the trusted clock signal; and throttling a rate of operation of data utilization circuitry of the integrated circuit device based on the accumulated value of the leaky accumulator.
- 14 . The method of claim 13 , wherein the cost per operation per compute clock cycle is determined based on a lookup table storing a relationship between performance of arithmetic operations and an indication of the operation.
- 15 . The method of claim 13 , wherein the rate of operation is throttled based at least in part by slowing or freezing the compute clock.
- 16 . The method of claim 13 , wherein throttling the rate of operation is based on hysteresis applied to a throttle signal that is output based on the accumulated value of the leaky accumulator.
- 17 . A performance monitor circuit comprising: an operation cost counter circuit to determine and accumulate a performance cost of operations performed by data utilization circuitry of an integrated circuit device based on a compute clock and an indication of the operations to be performed by the data utilization circuitry; a synchronization and edge detection circuit to detect a threshold value of the accumulated performance cost based on a check clock that is slower than, and not dependent on, the compute clock; a leaky accumulator circuit to accumulate the threshold values of the accumulated performance cost based on the check clock and gradually reduce the accumulated threshold values over time based on the check clock signal; and a comparator circuit to compare the accumulated threshold values from the leaky accumulator circuit to a stored limit to selectively produce a throttle signal to selectively throttle the data utilization circuitry.
- 18 . The performance monitor circuit of claim 17 , wherein the operation cost counter circuit comprises a lookup table to output the performance cost based on indications of the operations performed by the data utilization circuitry.
- 19 . The performance monitor circuit of claim 17 , wherein the synchronization and edge detection circuit comprises: a plurality of registers and combinatorial logic to detect a change in an edge of a most significant bit of the accumulated performance cost of the operation cost counter; and shifting circuitry to shift the output of the plurality of registers and combinatorial logic to output a result as the threshold value of the accumulated performance cost.
- 20 . The performance monitor circuit of claim 19 , wherein the stored limit corresponds to a selectable product performance level.
Description
BACKGROUND This disclosure relates to systems and methods to dynamically limit a performance of a component of an integrated circuit device, such as the rate of floating-point operations performed by the integrated circuit device. This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it may be understood that these statements are to be read in this light, and not as admissions of prior art. Integrated circuits are found in numerous electronic devices and provide a variety of functionality. Many high-performance integrated circuits have capabilities that exceed export limitations. There are increasing limitations on device performance, often expressed as a limit on the normalized trillion floating point operations per second (TFLOPs), for exporting certain types of computing devices. This includes central processing units (CPUs), graphics processing units (GPUs), and even programmable logic devices such as field programmable gate arrays (FPGAs). These devices may be excluded from being exported to certain countries because the devices are capable of a higher number of TFLOPs than permitted by export controls. BRIEF DESCRIPTION OF THE DRAWINGS Various aspects of this disclosure may be better understood upon reading the following detailed description and upon reference to the drawings in which: FIG. 1 is a block diagram of a system used to limit a rate of performance of data utilization circuitry of an integrated circuit device to within a specified target; FIG. 2 is a block diagram of a system used to limit a rate of performance of multiple instances of data utilization circuitry of an integrated circuit device to within a specified target; FIG. 3 is a block diagram of a performance monitor used to limit performance of data utilization circuitry of an integrated circuit device; FIG. 4 is a flowchart of a method for operating performance monitor to limit performance of data utilization circuitry of an integrated circuit device; FIG. 5 is a block diagram of a performance monitor used to limit performance of multiple instances of data utilization circuitry of an integrated circuit device; FIG. 6 is a circuit diagram illustrating example circuitry for a performance monitor; FIG. 7 is a block diagram of another example of a performance monitor used to limit performance of multiple instances of data utilization circuitry of an integrated circuit device; and FIG. 8 is a block diagram of a data processing system that may incorporate the systems and methods of this disclosure. DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS When introducing elements of various embodiments of the present disclosure, the articles “a,” “an,” and “the” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. Additionally, it should be understood that references to “one embodiment” or “an embodiment” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Furthermore, the phrase A “based on” B is intended to mean that A is at least partially based on B. Moreover, the term “or” is intended to be inclusive (e.g., logical OR) and not exclusive (e.g., logical XOR). In other words, the phrase A “or” B is intended to mean A, B, or both A and B. This disclosure provides systems and methods to automatically throttle the performance of an integrated circuit device to prevent the integrated circuit from exceeding a maximum allowed performance rate. This may enable a manufacturer to ship an integrated circuit device to any customer around the world without exceeding export limits. Indeed, rather than permanently disabling or destroying certain subcomponents of the integrated circuit device, a performance monitor circuit may be programmed to adhere to a specified average maximum performance limit over a suitable defined window of time. The performance monitor circuit may auto-throttle the integrated circuit device so it will not exceed that limit (e.g., an export limit). The customer may then use the integrated circuit device in any way they desire without exceeding the specified performance limit. For example, a customer may use the same software or the same field programmable gate array (FPGA) system design for all geographic regions, but the rate of performance may be limited based on geography. For example, if the performance monitor circuit of the integrated circuit device has fuses blown that specify a performance limit for a particular geographic region, the integrate