Search

CN-122019063-A - Transaction processing method, circuit and multi-core processor architecture

CN122019063ACN 122019063 ACN122019063 ACN 122019063ACN-122019063-A

Abstract

The present application relates to the field of computing hardware architecture, and in particular, to a transaction processing method, a circuit, and a multi-core processor architecture. The method comprises the steps of obtaining a transaction request from a data sending end through a transaction diversion module, diverting transaction attribute information to a response processing module if the transaction attribute information of the transaction request meets an advanced response condition, extracting a transaction identifier in the transaction attribute information through the response processing module, generating an advanced completion response according to the transaction identifier to return to the data sending end, and diverting the transaction request to a downstream processing module for transaction processing if the transaction attribute information does not meet the advanced response condition. Therefore, the application improves the transaction processing efficiency and the response speed by the execution modes of the two synchronous barrier transactions, and solves the technical problems of limited bandwidth efficiency improvement and single supporting synchronous barrier semantics caused by single synchronous barrier execution mode in the prior art.

Inventors

  • WANG GUANHUA
  • CHEN ZHUO
  • YU DONGHAO
  • LI YUNZHAO

Assignees

  • 飞腾信息技术有限公司

Dates

Publication Date
20260512
Application Date
20260107

Claims (11)

  1. 1. The transaction processing method is characterized by being applied to a transaction processing circuit, wherein the transaction processing circuit comprises a transaction shunting module and a response processing module, and the method comprises the following steps of: acquiring a transaction request from a data transmitting end through the transaction diversion module; Responding to the transaction attribute information of the transaction request to meet an advanced response condition, shunting the transaction attribute information to the response processing module, extracting a transaction identifier in the transaction attribute information through the response processing module, and generating an advanced completion response according to the transaction identifier so as to return to the data transmitting end; And responding to the transaction attribute information not meeting the early response condition, and shunting the transaction request to a downstream processing module for transaction processing.
  2. 2. The method of claim 1, wherein the transaction circuit further comprises a barrier processing module, the method further comprising: Transmitting, by the transaction offload module, the transaction request to the barrier processing module; Performing, by the barrier processing module, barrier synchronization instruction identification on the received transaction request; In response to identifying a barrier synchronization instruction, determining, by the barrier processing module, a transmission opportunity of the barrier synchronization instruction based on a preset configuration mode, and monitoring a completion status of all memory access transactions issued prior to the barrier synchronization instruction; And controlling the sending of the barrier synchronous instruction by the barrier processing module when the hardware transaction completion confirmation signal of all memory access transactions sent before the barrier synchronous instruction is monitored to return.
  3. 3. The method of claim 2, wherein the determining, by the barrier processing module, a transmission opportunity of the barrier synchronization instruction based on a preset configuration mode in response to identifying the barrier synchronization instruction, comprises: If the configuration mode is a conservative mode, when a barrier synchronization instruction is identified, blocking the transmission of a subsequent transaction of a processing channel where the barrier synchronization instruction is located by the barrier processing module, waiting for the return of a hardware transaction completion confirmation signal of all memory access transactions sent before the barrier synchronization instruction, and then transmitting the barrier synchronization instruction to a downstream processing module so as to ensure the correctness of barrier synchronization semantics; If the configuration mode is a performance mode, the barrier processing module does not block the sending of subsequent transactions when identifying a barrier synchronous instruction, and the barrier synchronous instruction is cached in a first-in first-out queue in the barrier processing module; and after the hardware transaction completion confirmation signals of all memory access transactions sent before the barrier synchronous instruction are returned, inserting the barrier synchronous instruction into the transaction flow of the current transaction flow pipeline through the barrier processing module to send.
  4. 4. The method of claim 2, wherein the transaction circuit further comprises a transaction arbitration module, the method further comprising: Receiving a transaction request shunted to the downstream processing module through the transaction arbitration module, arbitrating the transaction request and transmitting the arbitrated transaction request to the downstream processing module; Wherein the transaction request comprises a request which meets the early response condition and is processed by a barrier processing module, and a request which does not meet the early response condition and is not processed by the response processing module from the transaction diversion module.
  5. 5. The method of claim 4, wherein the transaction circuit further comprises an identifier replacement module, the method further comprising: And receiving the transaction request output by the barrier processing module through the identifier replacing module, and carrying out identifier remapping on the transaction identifier in the transaction request so as to transmit the transaction request after remapping to the transaction arbitration module.
  6. 6. The method of claim 5, wherein the transaction circuit further comprises a response splitting module, the method further comprising: Receiving a hardware transaction completion confirmation signal from the downstream processing module through the response shunting module, and analyzing according to the hardware transaction completion confirmation signal to obtain a transaction identifier; judging whether the corresponding transaction is responded in advance according to the transaction identifier: if the hardware transaction completion acknowledgement signal is responded in advance, the hardware transaction completion acknowledgement signal is transmitted to the barrier processing module, and the barrier processing module is used for updating the completion state of the barrier synchronization instruction based on the received hardware transaction completion acknowledgement signal.
  7. 7. The method of claim 6, wherein the transaction circuit further comprises a response arbitration module, the method further comprising: when the hardware transaction is not responded in advance, receiving the hardware transaction completion confirmation signal sent by the response shunting module and the response of completing in advance sent by the response processing module by the response arbitration module; And arbitrating and combining the advanced completion response and the hardware transaction completion confirmation signal, and returning the combined completion response to the data transmitting end.
  8. 8. The method of claim 7, wherein the transaction circuit further comprises a cache module, the method further comprising: and caching the transaction request from the transaction diversion module through the caching module, and sequentially transmitting the cached transaction request to the response processing module and the barrier processing module.
  9. 9. The method of claim 2, wherein the number of response processing modules is the same as the number of barrier processing modules, and wherein the response processing modules and the barrier processing modules form one or more independent processing channels.
  10. 10. A transaction circuit comprising a transaction diversion module and a response processing module, the transaction circuit configured to: acquiring a transaction request from a data transmitting end through the transaction diversion module; Responding to the transaction attribute information of the transaction request to meet an advanced response condition, shunting the transaction attribute information to the response processing module, extracting a transaction identifier in the transaction attribute information through the response processing module, and generating an advanced completion response according to the transaction identifier so as to return to the data transmitting end; And responding to the transaction attribute information not meeting the early response condition, and shunting the transaction request to a downstream processing module for transaction processing.
  11. 11. A multi-core processor architecture, wherein the data sender comprises any one of a plurality of processor cores, and the transaction circuit of claim 10, the transaction circuit configured to implement the method of any one of claims 1-9.

Description

Transaction processing method, circuit and multi-core processor architecture Technical Field The present application relates to the technical field of computing hardware architecture, and in particular, to a transaction processing method, a circuit and a multi-core processor architecture. Background With the rapid development of artificial intelligence technology, GPUs (Graphics Processing Unit, graphics processors) are increasingly widely applied in AI reasoning scenes, the requirements for reasoning acceleration are increasingly urgent, and link transmission efficiency becomes a key factor affecting the reasoning performance of the GPUs. In a distributed computing environment, multiple GPU collaborative reasoning requires frequent exchanges of intermediate data, while data transmission often faces the challenges of multiple hop long paths, introducing significant processing delays and bandwidth bottlenecks. The early response mechanism is used as an important means for optimizing link transmission, and aims to hide processing delay of slave equipment and improve link utilization rate by returning transaction completion response in advance. However, in the aspect of supporting efficient data synchronization, the existing advanced response mechanism has the technical problems of limited bandwidth efficiency improvement and single supporting synchronization barrier semantics caused by single execution mode of the synchronization barrier, and is difficult to meet the stringent requirements on low delay and high throughput in the AI reasoning scene. Disclosure of Invention The application aims at the defects or shortcomings, provides a transaction processing method, a circuit and a multi-core processor architecture, and can solve the technical problems of limited bandwidth efficiency improvement and single supporting synchronous barrier semantics in the existing early response mechanism. The main technical scheme adopted by the application comprises the following steps: The application provides a transaction processing method which is applied to a transaction processing circuit, the transaction processing circuit comprises a transaction shunting module and a response processing module, the transaction processing method comprises the steps of acquiring a transaction request from a data sending end through the transaction shunting module, shunting the transaction attribute information to the response processing module in response to the fact that the transaction attribute information of the transaction request meets an early response condition, extracting a transaction identifier in the transaction attribute information through the response processing module, generating an early completion response according to the transaction identifier to return to the data sending end, and shunting the transaction request to a downstream processing module to conduct transaction processing in response to the fact that the transaction attribute information does not meet the early response condition. The embodiment of the application integrates the transaction shunting module and the response processing module in the transaction processing circuit, and the transaction shunting module is directly connected with the data transmitting end and the response processing module and is connected with the downstream processing module, namely the transaction processing pipeline through the third end, wherein the transaction processing pipeline refers to an execution pipeline which is formed by a plurality of hardware processing modules in sequence and is used for carrying out sectional type flow processing on the transaction request, such as barrier identification, identifier replacement, arbitration output and other operations, so as to improve throughput and processing efficiency. The transaction diversion module is used for responding to the transaction attribute information (Transaction Attributes) of the transaction request to meet the early response condition, diverting the transaction attribute information to the response processing module, extracting a transaction identifier (Transaction Identifier) in the transaction attribute information through the response processing module, generating an early completion response according to the transaction identifier to return to the data sending end, and diverting the transaction request to the transaction processing pipeline for transaction processing if the transaction attribute information of the transaction request does not meet the early response condition. Therefore, the transaction processing circuit judges according to the conditions of the transaction attribute information, realizes the shunting processing of the transaction, immediately generates an early response for the transaction meeting the conditions, effectively hides the processing delay, reduces the blocking of a transmitting end, thereby improving the utilization rate of the link bandwidth and the throughput of the syste