EP-4156518-B1 - CLOCK SYNCHRONIZATION PROCESS AND CIRCUITRY

EP4156518B1EP 4156518 B1EP4156518 B1EP 4156518B1EP-4156518-B1

Inventors

Kumashikar, Mahesh
NALAMALPU, Ankireddy
SUBBAREDDY, Dheeraj
THAKUR, ANSHUMAN
HOSSAIN, Md
MAHESHWARI, ATUL

Dates

Publication Date: 20260513
Application Date: 20220803

Claims (15)

A processor (550, 590) comprising: a central processing unit, CPU, (555) comprising one or more processor cores operating based on a CPU clock frequency; a programmable logic device, PLD, (560) communicatively coupled to the CPU (555) and operating based on a first phase-locked-loop, PLL, clock frequency provided by a first PLL circuit (565), wherein the first PLL circuit (565) is configurable to provide the first PLL clock frequency up to a maximum PLL clock frequency of the first PLL circuit (565); and a first fractional PLL circuit (570) communicatively coupled to the CPU (555) and the PLD (560) wherein the first fractional PLL circuit (570) is configured to perform clock synchronization operations comprising: receive an indication of the CPU clock frequency; receive an indication of the maximum PLL clock frequency; determine the first PLL clock frequency to be below the maximum PLL clock frequency, wherein the CPU clock frequency is an integer multiple value of the first PLL clock frequency; provide an indication of the first PLL clock frequency to the first PLL circuit (565) to set the first PLL clock frequency; and provide a clock ratio between the CPU clock frequency and the first PLL clock frequency to the CPU (555) to enable the CPU (555) to communicate with the PLD (560), wherein the clock ratio is based on the integer multiple value.
The processor (550, 590) of claim 1, wherein the CPU (555) and the PLD (560) are configured to communicate data using the first PLL clock frequency; optionally wherein the CPU (555) is configured to transmit and receive data using a number of clock cycles of the CPU clock frequency based on the clock ratio.
The processor (550, 590) of any one of claims 1 or 2, wherein the CPU clock frequency is higher than the maximum PLL clock frequency.
The processor (550, 590) of any one of claims 1 to 3, wherein the PLD (560) comprises one or more computation grids.
The processor (550, 590) of any one of claims 1 to 4, wherein the PLD (560) comprises a state machine configured to: provide an indication of the maximum PLL clock frequency to the first fractional PLL circuit (570); and receive an indication of the first PLL clock frequency from the first fractional PLL circuit (570).
The processor (550, 590) of any one of claims 1 to 5, wherein the PLD (560) comprises the first fractional PLL circuit (570) implemented thereto.
The processor (550, 590) of any one of claims 1 to 5, wherein the first fractional PLL circuit (570) is positioned between the CPU (555) and the PLD (560).
The processor (550, 590) of any one of claims 1 to 7, wherein the first fractional PLL circuit (570) is implemented using programmable circuitry, hardened circuitry, processing circuitry, any combination thereof, or any other viable circuitry, to perform the clock synchronization operations.
The processor (550, 590) of any one of claims 1 to 8, further comprising: a plurality of PLL circuits comprising the first PLL circuit (565), each of the plurality of PLL circuits providing a respective PLL clock frequency to a portion of the PLD (560); and a plurality of fractional PLL circuits including the first fractional PLL circuit (570), each of the plurality of fractional PLL circuits is associated with a respective PLL circuit (565) of the plurality of PLL circuits to perform clock synchronization operations.
A method comprising: Receiving (605), by a fractional phase-locked-loop, PLL, circuit (570) of a processor (550, 590), an indication of a central processing unit, CPU, clock frequency of a CPU (555) of the processor (550, 590); receiving (605), by the fractional PLL circuit (570), an indication of maximum clock frequency of a programmable logic device, PLD, (560) of the processor (550, 590); determining (610), by the fractional PLL circuit (570), a PLL clock frequency for the PLD (560) below the maximum clock frequency of the PLD (560), wherein the CPU clock frequency is an integer multiple value of the PLL clock frequency; providing (615), by the fractional PLL circuit (570), an indication of the PLL clock frequency to the PLD (560) to set a clock frequency of a PLL circuit (565) of the PLD (560); and providing (620), by the fractional PLL circuit (570), a clock ratio to the CPU (555) to enable the CPU (555) to communicate with the PLD (560), wherein the clock ratio is based on the integer multiple value.
The method of claim 10, wherein the CPU clock frequency is higher than the maximum clock frequency of the PLD (560), and/or wherein the maximum clock frequency of the PLD (560) corresponds to a maximum clock frequency of the PLL circuit (565) of the PLD (560).
The method of any one of claims 10 or 11, wherein the clock ratio indicates a ratio between the CPU clock frequency and the PLL clock frequency and the CPU (555) communicates with the PLD (560) using the PLL clock frequency based on receiving the clock ratio.
The method of any one of claims 10 to 12, wherein the PLD (560) performs operations using the PLL clock frequency.
The method of any one of claims 10 to 13, wherein the fractional PLL circuit (570) is implemented on the PLD (560), wherein the fractional PLL circuit (570) facilitates asynchronous communication between the CPU (555) and the PLD (560).
The method of any one of claims 10 to 13, wherein the fractional PLL circuit (570) is implemented using dedicated circuitry on the processor (550, 590), wherein the fractional PLL circuit (570) facilitates synchronous communication between the CPU (555) and the PLD (560).

Description

Background This disclosure relates to a flexible instruction set architecture for a processor by incorporating a programmable fabric into the architecture of the processor to provide a more flexible instruction set architecture. This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present techniques, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be noted that these statements are to be read in this light, and not as admissions of any kind. Integrated circuits are found in numerous electronic devices, from handheld devices, computers, gaming systems, robotic devices, automobiles, and more. Some integrated circuits, such as central processing units (CPUs) and/or microprocessors (µP) may utilize offload computing and/or acc+ to utilize other devices (e.g., programmable logic devices) to assist the CPU/µP in performing certain operations. However, certain compute models for implementing offloading may be limited due to latency, memory coherency, or flexibility issues in the implementations used to provide the acceleration. For instance, the implementations may include an Ethernet-based accelerator, a peripheral component interconnect express (PCIE)-based accelerator, an Ultra Path Interconnect (UPI)-based accelerator, an Intel Accelerator Link (IAL), or a cache coherent interconnect for accelerators (CCIX)-based accelerator. However, at least some of these interconnects may have a high latency relative to latency in the CPU/µP, inflexibility of usage, and/or a lack of memory coherency. For instance, a PCIE/Ethernet-based implementations may have a relatively long latency (e.g., 100 µs) relative to the latency in the CPU/µP. Furthermore, the PCIE/Ethernet-based implementations may lack memory coherency. UPI/IAL/CCIX-based accelerator may have a lower latency (e.g., 1 µs) than the PCIE/Ethernet implementations while having coherency, but the UPI/IAL/CCIX-based accelerators may utilize limited flexibility via fine-grained memory sharing. For instance, UPI/IAL/CCIX-based accelerators are first integrated into core software before being utilized. US 5 909 563 A discloses an interface between two clock domains, which transfers data between the two clock domains. Each clock domain has a respective clock. The two clocks have a fixed relationship. A data signal is registered in a plurality of registers in the first clock domain, the number of which is related to the fixed relationship of the clocks. Each register outputs a specific output signal, one of which is selected by a multiplexer in the second clock domain, to be output to an output register. The output register then outputs the data in synchronism with the second clock US 2009/158078 A1 discloses a clock ratio controller for dynamic voltage and frequency scaled digital systems, and applications thereof. US 6 424 688 B1 discloses a system and method for transferring data from a first clock domain to a second clock domain, wherein a clock skipping technique is employed to maintain the same level of data throughput in the transmitting and receiving domains. Summary of the Invention In a first aspect of the present invention a processor according to claim 1 is provided. In a second aspect of the present invention a method according to claim 10 is provided. Additional features for advantageous embodiments of the present invention are provided in the dependent claims thereto. Brief Description of the Drawings Various aspects of this disclosure may be better understood upon reading the following detailed description and upon reference to the drawings in which: FIG. 1 is a block diagram of a register architecture, in accordance with an embodiment;FIG. 2A is a block diagram illustrating an in-order pipeline and a register renaming, out-of-order issue/execution pipeline, in accordance with an embodiment;FIG. 2B is a block diagram illustrating an in-order architecture core and a register renaming, out-of-order issue/execution architecture core to be included in a processor, in accordance with an embodiment;FIGS. 3A and 3B illustrate a block diagram of a more specific example in-order core architecture, which core would be one of several logic blocks (including other cores of the same type and/or different types) in a chip, in accordance with an embodiment;FIG. 4 is a block diagram of a processor that may have more than one core, may have an integrated memory controller, and may have integrated graphics, in accordance with an embodiment;FIG. 5 is a block diagram of a system, in accordance with an embodiment;FIG. 6 is a block diagram of a first more specific example system, in accordance with an embodiment;FIG. 7 is a block diagram of a system on a chip (SoC), in accordance with an embodiment;FIG. 8 is a block