CN-121979807-A - Method, system, equipment and medium for coordinating sequential memory access and calculation
Abstract
The invention provides a method, a system, equipment and a medium for collaborative sequencing memory access and calculation, which are used for carrying out parallel analysis on the same external command through two command resolvers with the same structure, and cooperatively generating address information for memory access and an operation instruction for calculation control, wherein the address information and the operation instruction are complementary in function and synchronous in time sequence. Compared with the traditional scheme relying on an OST buffer and matched control logic thereof, the architecture does not need to additionally cache control information or intermediate states, so that the OST buffer and related read-write logic are effectively omitted, the chip area overhead of the AI accelerator is obviously reduced, meanwhile, the dynamic read-write power consumption and static leakage power consumption of the AI accelerator are eliminated, and the energy efficiency ratio of the system is obviously improved. In addition, the triggering of the access operation does not depend on the capacity of a buffer area, so that the number of accesses supportable by a single task is not limited by the OST scale any more, and the flexibility and the parallelism capability of task processing are greatly enhanced.
Inventors
- Request for anonymity
- Request for anonymity
Assignees
- 上海光羽芯辰科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260402
Claims (10)
- 1. A serialized memory access and computation collaboration method for use with an AI accelerator communicatively coupled to a main memory, the AI accelerator comprising a request decoder, a response decoder, and an arithmetic logic unit, the method comprising: Receiving an external command issued by a main control, and generating an address request signal for accessing a main memory by analyzing the external command through a request decoder; Based on a preset serialization access rule, sequentially reading target data from a main memory according to the address request signal, and completing the serialization data reading operation; Based on cooperative control logic between memory access and calculation, after the target data is ready, driving the arithmetic logic unit to execute corresponding logic operation on the target data according to the operation control signal to generate a calculation result; And writing the calculation result back to the main storage in turn according to the preset serialization access rule, and completing the serialization data write-back operation.
- 2. The method of claim 1, wherein the generating an address request signal for accessing the main memory by the request decoder parsing the external command, comprises: The request decoder analyzes the external command and extracts the base address register code, immediate offset and read-write attribute contained in the external command; performing sign extension or zero extension processing on the immediate offset according to an address bit width standard specified by a main memory, and adding the extended immediate and a register value pointed by the base address register code to generate a target physical address; and packaging the target physical address and the read-write attribute into an address request signal, and sending the address request signal to a main storage to trigger corresponding data reading operation.
- 3. The method of claim 1, wherein generating an operation control signal for controlling an arithmetic logic unit to perform a specific operation by parsing the external command in response to a decoder, comprises: analyzing the external command by a response decoder, and extracting the combination of the operation code and the function field contained in the external command to determine the target operation type; Decoding the operation code into corresponding ALU operation signals, source operand selection signals and target write enable signals according to the target operation type; and integrating and formatting and packaging the ALU operation signal, the source operand selection signal and the destination write enable signal into operation control signals, and sending the operation control signals to a data path so as to drive an arithmetic logic unit to execute specified operation.
- 4. The method according to claim 1, wherein the sequentially reading the target data from the main memory according to the address request signal based on the preset serialized access rule, and completing the serialized data reading operation, includes: Analyzing the target physical addresses, the read-write attributes and the effective enabling bits contained in the address request signal according to the address request signal, and carrying out data dependency analysis and access priority ordering on a plurality of target physical addresses related to a current task by combining with a preset serialization access rule to generate an ordered address access sequence with strict sequential execution sequence; The method comprises the steps of sequentially initiating data reading requests to a main storage according to the ordered address access sequence, wherein the initiation of the reading request of the next address is triggered only when the reading request of the last address has obtained an effective response of the main storage or target data cached in the current data path has been completely consumed by an arithmetic logic unit, so as to avoid congestion of the data path; receiving all target data returned by the main storage, and caching and aligning according to the execution sequence of the ordered address access sequence to ensure that the data receiving sequence is completely consistent with the request initiating sequence; And outputting a data reading ready signal after all the data of the target physical address are read and checked without errors, and marking that the serialized data reading operation is finished.
- 5. The method according to claim 1, wherein the generating a calculation result based on cooperative control logic between memory access and calculation, after the target data is ready, by driving the arithmetic logic unit to perform a corresponding logic operation on the target data according to the operation control signal, includes: Triggering a cooperative operation flow of memory access and calculation after the completion of the target data loading and the successful decoding of the operation control signal are detected, and starting the operation execution of an arithmetic logic unit; Selecting target data corresponding to the current operation from an input data path of the arithmetic logic unit according to a source operand selection signal obtained by decoding the operation control signal, and taking the target data as a source operand of the arithmetic logic unit; Dynamically configuring an operation mode of an arithmetic logic unit according to an ALU operation mode signal obtained by decoding the operation control signal, and driving the arithmetic logic unit to execute specified operation on a loaded source operand to generate a calculation result; According to the target write enabling signal obtained by decoding the operation control signal, the calculation result output by the arithmetic logic unit is written into a temporary result caching unit configured in the arithmetic logic unit, and the result sequence identification corresponding to the current operation step is automatically associated and recorded during writing through hardware logic, so that the sequence consistency of the data flow in the subsequent serialization write back process is ensured.
- 6. The method according to claim 1, wherein writing the calculation results back to the main storage in turn according to the preset serialized access rule, and completing the serialized data write-back operation, includes: Triggering a serialization write-back flow when detecting that the calculation result stored in the temporary result caching unit of the arithmetic logic unit has completed operation exception verification and the main storage interface is in an idle writable state; After verification is passed, performing format conversion and alignment processing on the calculation result stored in the temporary result caching unit according to the data storage format requirement of the main storage to generate a processed calculation result conforming to the main storage interface protocol; and writing the processed calculation result back to a target physical address corresponding to the main storage in sequence according to a write address sequence generated by the preset serialization access rule, so as to complete the serialized data write-back operation.
- 7. The method according to claim 1 or 6, wherein the step of sequentially writing back the calculation results to the main storage according to the preset serialized access rule, after completing the serialized data write-back operation, further comprises: after confirming that all calculation results to be written back are written into the corresponding target physical addresses of the main storage in sequence according to the write-back addresses, generating a task completion signal and feeding back to the main control, and executing integrity check on the data written back into the main storage; and if the integrity check is passed, releasing temporary resources occupied by the current task to restore the context state of the AI accelerator, and preloading external commands or data of the next task to a designated cache unit to enter a standby state according to a preset scheduling strategy so as to support efficient pipeline processing of continuous tasks.
- 8. A data processing system of a near memory computing architecture, for use with an AI accelerator communicatively coupled to a main memory, the AI accelerator comprising a request decoder, a response decoder, and an arithmetic logic unit, the system comprising: The device comprises a main control unit, an analysis module, a response decoder, an arithmetic logic unit, a storage module, a control module and a control module, wherein the main control unit is used for controlling the arithmetic logic unit to execute specific operation; The access module is used for reading target data from the main memory in sequence according to the address request signal based on a preset serialization access rule to complete the serialization data reading operation; The operation module is used for driving the arithmetic logic unit to execute corresponding logic operation on the target data according to the operation control signal after the target data are ready based on cooperative control logic between access memory and calculation, so as to generate a calculation result; and the write-back module is used for writing back the calculation result into the main storage in sequence according to the preset serialization access rule to finish the serialized data write-back operation.
- 9. An electronic device comprising a memory for storing a computer program and a processor for executing the computer program stored in the memory to cause the processor to perform the steps of the method according to any one of claims 1 to 7.
- 10. A computer readable storage medium, characterized in that it has stored thereon a program which, when run, is adapted to carry out the steps of the method according to any of claims 1 to 7.
Description
Method, system, equipment and medium for coordinating sequential memory access and calculation Technical Field The invention relates to the technical field of AI accelerators, in particular to a method, a system, equipment and a medium for coordinating sequential memory access and calculation. Background When the current AI accelerator accesses memory, the related information of data return, such as usage mode, data type, access state, output address, etc., needs to be recorded when a request is sent. The information is stored in an on-chip SRAM, and the size of the SRAM limits the number of applications which can be issued, thereby affecting access capacity. Because more information is recorded and OTS (Outstanding Transactions) is needed, the depth of the SRAM is increased, the area is increased, the negative effects of power consumption and area are brought, and finally the energy efficiency ratio and the surface efficiency ratio of the AI accelerator are reduced. Therefore, an effective synergistic method of sequential access and computation is needed to solve the above problems. Disclosure of Invention In view of the foregoing, the present invention has been developed to provide a serialized memory access and computation collaboration method, system, apparatus, and medium that overcome, or at least partially solve, the foregoing problems. To achieve the above and other related objects, the present invention provides a serialized memory access and computation cooperative method, an AI accelerator communicatively connected to a main memory, the AI accelerator including a request decoder, a response decoder, and an arithmetic logic unit, the method comprising: Receiving an external command issued by a main control, and generating an address request signal for accessing a main memory by analyzing the external command through a request decoder; Based on a preset serialization access rule, sequentially reading target data from a main memory according to the address request signal, and completing the serialization data reading operation; Based on cooperative control logic between memory access and calculation, after the target data is ready, driving the arithmetic logic unit to execute corresponding logic operation on the target data according to the operation control signal to generate a calculation result; And writing the calculation result back to the main storage in turn according to the preset serialization access rule, and completing the serialization data write-back operation. Optionally, the generating, by the request decoder parsing the external command, an address request signal for accessing the main storage includes: The request decoder analyzes the external command and extracts the base address register code, immediate offset and read-write attribute contained in the external command; performing sign extension or zero extension processing on the immediate offset according to an address bit width standard specified by a main memory, and adding the extended immediate and a register value pointed by the base address register code to generate a target physical address; and packaging the target physical address and the read-write attribute into an address request signal, and sending the address request signal to a main storage to trigger corresponding data reading operation. Optionally, the generating an operation control signal for controlling the arithmetic logic unit to perform a specific operation by parsing the external command in response to the decoder includes: analyzing the external command by a response decoder, and extracting the combination of the operation code and the function field contained in the external command to determine the target operation type; Decoding the operation code into corresponding ALU operation signals, source operand selection signals and target write enable signals according to the target operation type; and integrating and formatting and packaging the ALU operation signal, the source operand selection signal and the destination write enable signal into operation control signals, and sending the operation control signals to a data path so as to drive an arithmetic logic unit to execute specified operation. Optionally, based on a preset serialized access rule, the sequentially reading target data from the main storage according to the address request signal, and completing the serialized data reading operation, including: Analyzing the target physical addresses, the read-write attributes and the effective enabling bits contained in the address request signal according to the address request signal, and carrying out data dependency analysis and access priority ordering on a plurality of target physical addresses related to a current task by combining with a preset serialization access rule to generate an ordered address access sequence with strict sequential execution sequence; The method comprises the steps of sequentially initiating data reading requests to a main