CN-122018848-A - High-speed data path optimization method, device, computer equipment and storage medium

CN122018848ACN 122018848 ACN122018848 ACN 122018848ACN-122018848-A

Abstract

The embodiment of the application provides a method, a device, a computer device and a storage medium for optimizing a high-speed data path, wherein the method comprises the steps of setting a buffer space of a ping-pong register and an output register in a buffer; if the buffer space is empty, input data is written into the buffer space, an input write fifo interface and a fifo full flag bit interface are modified into an input handshake interface, an output read fifo interface and a fifo empty flag bit interface are modified into an output handshake interface, the maximum storage capacity of internal effective counting fifo is used as an additional newly added output interface, if the fifo storage space is full, data in the storage space is actively output, a stack buffer space is arranged in an arbiter, simultaneously input requests are used as a request group, the request group is placed into the stack buffer space according to the input sequence, and the request group is subjected to arbitration sequencing, so that the problems of interface conversion, data routing and the like can be solved by combining an optimized buffer, a first-in first-out queue and the arbiter.

Inventors

DAI CHENG
LONG BIN
TAN YUXI

Assignees

长沙景嘉微电子股份有限公司
长沙景美集成电路设计有限公司

Dates

Publication Date: 20260512
Application Date: 20251231

Claims (10)

1. A method of optimizing a high-speed data path, comprising: setting a ping-pong register and an output register in a buffer, and constructing three buffer spaces corresponding to the ping-pong register and the output register; Judging whether the buffer memory space is empty or not under the condition that input data exist; And under the condition that the buffer space is empty, receiving the input data, and writing the input data into the buffer space according to a preset principle.
2. The method of claim 1, wherein the buffer space comprises a P buffer space, a B buffer space, and an output register, the P buffer space and the B buffer space are connected in parallel, and the P buffer space and the B buffer space are both connected to the output register, and writing the input data into the buffer space according to a preset rule comprises: Determining whether to write the input data into the output register according to the data storage state of the output register; under the condition that input data cannot be written into an output register, alternately writing the input data into a P cache space or a B cache space from the P cache space; in the case that the output register is empty, data of the P-cache space and the B-cache space are alternately transferred to the output register starting from the P-cache space.
3. The method according to claim 1, wherein the method further comprises: The input write first-in first-out fifo interface and fifo full bit interface are modified to be input handshake interfaces through interface redefinition, and the output read fifo interface and fifo empty bit interface are modified to be output handshake interfaces; Taking the maximum storage capacity of the internal effective count fifo ‌ as an additional newly added output interface; Judging whether the storage space of the first-in first-out queue is full or not under the condition that input data exist; Receiving input data under the condition that the storage space of the first-in first-out queue is not full; and actively outputting the storage data in the storage space under the condition that the storage space of the first-in first-out queue is not empty.
4. A method according to claim 3, characterized in that the method further comprises: Setting a stack cache space in an arbiter, and recording a request input simultaneously in each period as a request group in units of clock periods; placing the request group into a stack cache space according to the input sequence of the request group; And under the condition that the stack cache space is not empty, ejecting a request group data, transmitting the request group data to an arbitration core for arbitration sequencing, outputting an arbitration result, and in the arbitration process, under the condition that the corresponding request processing in the request group is finished, clearing the corresponding bit of the request until the bit of the last request in the current request group is cleared, and ejecting the next request group data by the stack cache space.
5. A method according to claim 3, characterized in that the method further comprises: In the interface conversion process, input data is subjected to interface simplification and time sequence processing through a buffer; the data output from the buffer passes through the first-in first-out queue to buffer part of the data; After the data output by the first-in first-out queue is converted by the interface, the data is selectively output after passing through the first-in first-out queue and the buffer.
6. The method according to claim 4, wherein the method further comprises: In the data processing process, the multiple paths of disordered input data are shunted through a buffer to finish interface simplification and time sequence processing, the multiple paths of disordered data output by the buffer are correspondingly stored in multiple first-in first-out queues, and the first-out queues are used as operation spaces for disordered recovery; and outputting the data output by the first-in first-out queue according to a preset sequence or through an arbiter.
7. The method according to claim 4, wherein the method further comprises: In the data routing process, input data firstly pass through a buffer to finish interface processing, and the data output by the buffer is stored in a first-in first-out queue; Under the condition that the first-in first-out queue meets the preset data accumulation requirement, completing data routing through a multi-cascade arbiter, wherein a buffer or the first-in first-out queue is selectively inserted between different cascade arbiters to perform data processing and time sequence optimization; and outputting the data which completes the data routing after passing through a buffer or a first-in first-out queue.
8. An apparatus for optimizing a high-speed data path, comprising: the building module is used for setting a ping-pong register and an output register in the buffer and building three buffer spaces corresponding to the ping-pong register and the output register; the judging module is used for judging whether the buffer memory space is empty or not under the condition that input data exist; and the writing module is used for receiving the input data under the condition that the buffer space is empty, and writing the input data into the buffer space according to a preset principle.
9. Computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, carries out the steps of the method for optimizing a high-speed data path according to any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the high speed data path optimization method according to any of claims 1 to 7.

Description

High-speed data path optimization method, device, computer equipment and storage medium Technical Field The present application relates to the field of computer technologies, and in particular, to a method and apparatus for optimizing a high-speed data path, a computer device, and a storage medium. Background The current hot fields such as AI, 6G, automatic driving and the like are all computationally intensive demands, the main stream solution of the market is multi-core design represented by GPU, GPGPU and the like, and in the multi-core design, data supply, thread data interaction and inter-core data interaction of a plurality of computing units in a core all belong to complex and changeable data transmission and data processing demands, so that how to efficiently meet the demands is very important and urgent. In the related design, the buffer buf, the fifo queue fifo, the arbiter arbiter and the like are also used, but the related design has a plurality of defects that the transmission efficiency loss is caused by the fact that the detail control time sequence cannot be completely pipelined, the flexibility of the conventional interface form in the transmission control of the interface is insufficient, the sequence loss during the arbitration queuing is caused, the development and verification cost caused by the non-uniform interface form and the realization timeliness in the requirement adjustment are insufficient, so that how to comprehensively consider and solve the problems is important to the performance and the development cost of interconnection realization in the high-speed data path processing. Disclosure of Invention The embodiment of the application provides a method, a device, computer equipment and a storage medium for optimizing a high-speed data path. In a first aspect of an embodiment of the present application, there is provided a method for optimizing a high-speed data path, including: setting a ping-pong register and an output register in a buffer, and constructing three buffer spaces corresponding to the ping-pong register and the output register; Judging whether the buffer memory space is empty or not under the condition that input data exist; And under the condition that the buffer space is empty, receiving the input data, and writing the input data into the buffer space according to a preset principle. In an optional embodiment of the present application, the buffer space includes a P buffer space, a B buffer space, and an output register, where the P buffer space and the B buffer space are connected in parallel, and the P buffer space and the B buffer space are both connected to the output register, where writing input data into the buffer space according to a preset rule includes: Determining whether to write the input data into the output register according to the data storage state of the output register; under the condition that input data cannot be written into an output register, alternately writing the input data into a P cache space or a B cache space from the P cache space; in the case that the output register is empty, data of the P-cache space and the B-cache space are alternately transferred to the output register starting from the P-cache space. In an alternative embodiment of the application, the method further comprises: The input write FIFO queue fifo interface and fifo full bit interface are modified to be input handshake interfaces, and the output read fifo interface and fifo empty bit interface are modified to be output handshake interfaces through interface redefinition; Taking the maximum storage capacity of the internal effective count fifo ‌ as an additional newly added output interface; Judging whether the storage space of the first-in first-out queue is full or not under the condition that input data exist; Receiving input data under the condition that the storage space of the first-in first-out queue is not full; and actively outputting the storage data in the storage space under the condition that the storage space of the first-in first-out queue is not empty. In an alternative embodiment of the application, the method further comprises: Setting a stack cache space in an arbiter, and recording a request input simultaneously in each period as a request group in units of clock periods; placing the request group into a stack cache space according to the input sequence of the request group; And under the condition that the stack cache space is not empty, ejecting a request group data, transmitting the request group data to an arbitration core for arbitration sequencing, outputting an arbitration result, and in the arbitration process, under the condition that the corresponding request processing in the request group is finished, clearing the corresponding bit of the request until the bit of the last request in the current request group is cleared, and ejecting the next request group data by the stack cache space. In an alternative embodiment of the applicat