CN-121984979-A - Synchronous request processing device, method and data processing device
Abstract
The embodiment of the invention discloses a synchronous request processing device, a synchronous request processing method and a synchronous request processing device, wherein the synchronous request processing device comprises a merging unit and a control unit, the merging unit comprises at least one cache line, the cache line comprises a tag area and a plurality of cache sectors, the tag area is used for storing corresponding thread identifications and managed request address information, and the cache sectors are used for storing synchronous request data information. The control unit is configured to receive an initial synchronization request, change and count data to be synchronized carried by the initial synchronization request meeting the condition, generate a target synchronization request, and cache the target synchronization request to a corresponding cache line based on address information of the target synchronization request. In the embodiment, the synchronous requests are stored in an accumulated change counting mode, and a plurality of synchronous requests are combined under the condition that no extra data processing delay is introduced, so that the memory bandwidth consumption of the synchronous requests is reduced, and the bandwidth utilization rate is improved.
Inventors
- ZHANG MIAO
- YANG SHUYUAN
- OU PENG
- ZHOU JINYUAN
Assignees
- 平头哥(上海)半导体技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20250715
Claims (12)
- 1. A synchronization request processing apparatus, characterized in that the synchronization request processing apparatus comprises: The merging unit comprises at least one cache line, wherein the cache line comprises a tag area and a plurality of cache sectors, the tag area is used for storing corresponding thread identification and managed request address information, and the cache sectors are used for storing data information of synchronous requests; the control unit is configured to receive an initial synchronization request, change and count needed synchronization data carried by the initial synchronization request meeting the conditions, generate a target synchronization request, and cache the target synchronization request to a corresponding cache line based on address information of the target synchronization request.
- 2. The apparatus according to claim 1, wherein the control unit comprises: The computing module is configured to judge the type of the received initial synchronous request, respond to the type of the initial synchronous request to be computed, count the change of the received synchronous request based on a preset computing mode and generate a corresponding target synchronous request based on a change counting result; The hit judgment module is configured to inquire whether a cache line matched with the address information of the target synchronous request exists in the merging unit or not so as to determine a hit result; The control module is configured to respond to the target synchronous request to hit the corresponding cache line, cache the target synchronous request to the cache sector of the cache line hit by the target synchronous request, respond to the target synchronous request to miss the corresponding cache line and to have an idle cache line, cache the target synchronous request to a newly allocated cache line, the newly allocated cache line is determined from the idle cache lines, and the thread identification carried by the tag area of the newly allocated cache line and the managed request address information are matched with the target synchronous request.
- 3. The apparatus of claim 2, wherein the calculation module is further configured to calculate a byte level change accumulation or a cache line change accumulation corresponding to the data to be synchronized carried by the initial synchronization request to generate the corresponding target synchronization request.
- 4. The apparatus of claim 2, wherein the control module is further configured to, in response to the target synchronization request not missing a corresponding cache line and no free cache line exists, wait for a next cycle at a back pressure front stage and issue a cache line processing request.
- 5. The apparatus of claim 1, wherein the control unit further comprises: an arbitration module configured to determine a cache line to be processed by arbitration; and the access module is configured to merge each target synchronous request in the cache line to be processed, determine a merging request and perform main memory access based on the merging request.
- 6. The apparatus of claim 5, wherein the arbitration module is further configured to determine the cache line as the pending cache line in response to each cache sector in the cache line being non-empty.
- 7. The apparatus of claim 5, wherein the arbitration module is further configured to select the pending cache line from among the cache lines based on a predetermined arbitration scheme in response to an unprocessed cache line exceeding a predetermined time or a received cache line processing request.
- 8. A method of processing a synchronization request, the method comprising: Receiving an initial synchronization request; performing change counting on data to be synchronized carried by the initial synchronization request meeting the conditions to generate a target synchronization request; Caching the target synchronous request to a corresponding cache line in a merging unit based on the address information of the target synchronous request; The merging unit comprises at least one cache line, the cache line comprises a tag area and a plurality of cache sectors, the tag area is used for storing corresponding thread identification and managed request address information, and the cache sectors are used for storing data information of synchronous requests.
- 9. The method of claim 8, wherein the method further comprises: determining a cache line to be processed through arbitration; and merging all target synchronous requests in the cache line to be processed, determining a merging request, and performing main memory access based on the merging request.
- 10. A synchronization request processing apparatus, characterized in that the apparatus comprises: A request receiving unit configured to receive an initial synchronization request; the format conversion unit is configured to change and count the data to be synchronized carried by the initial synchronization request meeting the conditions, and generate a target synchronization request; a caching unit configured to cache the target synchronization request to a corresponding cache line in the merging unit based on address information of the target synchronization request; The merging unit comprises at least one cache line, the cache line comprises a tag area and a plurality of cache sectors, the tag area is used for storing corresponding thread identification and managed request address information, and the cache sectors are used for storing data information of synchronous requests.
- 11. A data processing apparatus, the apparatus comprising: the operation request receiving module is configured to receive an operation request and output a corresponding synchronous request and a data processing request; the main control module is configured to receive and analyze the data processing request and execute corresponding data operation; the synchronization request processing device according to any one of claims 1-7.
- 12. A computer device, the computer device comprising: A plurality of processing cores; memory and A synchronization request processing apparatus according to any one of claims 1 to 7, wherein each of the processing cores corresponds to a respective one of the processing cores.
Description
Synchronous request processing device, method and data processing device Technical Field The present invention relates to the field of semiconductor technologies, and in particular, to a synchronization request processing apparatus, a synchronization request processing method, and a data processing apparatus. Background With the wide application of large predictive models, access bandwidth is becoming an increasingly popular operating bottleneck for processing cores, and saving main memory or cache levels by broadcasting data between lower memory levels is becoming a choice for many processor architectures. In order to solve the synchronization problem between the multiple broadcasting, a Memory Barrier (mbar) completion mechanism is introduced, i.e. each broadcasting data synchronously writes a mbar to the destination, and the destination queries the mbar to learn that all data writing is completed. Since the number of mbar and the number of writing requests are equal, if each mbar is written independently, the destination writing bandwidth is reduced greatly, so how to share the memory bandwidth of mbar is a problem to be solved. Disclosure of Invention In view of this, embodiments of the present invention provide a synchronization request processing apparatus, a method, and a data processing apparatus, so as to store a synchronization request by adopting an accumulated change count manner, and realize merging of multiple synchronization requests without introducing additional data processing delay, thereby reducing memory bandwidth consumption of the synchronization request and improving bandwidth utilization. Meanwhile, the embodiment stores the synchronous request in a cumulative change counting mode instead of calculating complete synchronous request data, so that complex interaction of a pre-shared memory is not needed. In a first aspect, an embodiment of the present invention provides a synchronization request processing apparatus, including: The merging unit comprises at least one cache line, wherein the cache line comprises a tag area and a plurality of cache sectors, the tag area is used for storing corresponding thread identification and managed request address information, and the cache sectors are used for storing data information of synchronous requests; the control unit is configured to receive an initial synchronization request, change and count needed synchronization data carried by the initial synchronization request meeting the conditions, generate a target synchronization request, and cache the target synchronization request to a corresponding cache line based on address information of the target synchronization request. In a second aspect, an embodiment of the present invention provides a method for processing a synchronization request, where the method includes: Receiving an initial synchronization request; performing change counting on data to be synchronized carried by the initial synchronization request meeting the conditions to generate a target synchronization request; Caching the target synchronous request to a corresponding cache line in a merging unit based on the address information of the target synchronous request; The merging unit comprises at least one cache line, the cache line comprises a tag area and a plurality of cache sectors, the tag area is used for storing corresponding thread identification and managed request address information, and the cache sectors are used for storing data information of synchronous requests. In a third aspect, an embodiment of the present invention provides a synchronization request processing apparatus, including: A request receiving unit configured to receive an initial synchronization request; the format conversion unit is configured to change and count the data to be synchronized carried by the initial synchronization request meeting the conditions, and generate a target synchronization request; a caching unit configured to cache the target synchronization request to a corresponding cache line in the merging unit based on address information of the target synchronization request; The merging unit comprises at least one cache line, the cache line comprises a tag area and a plurality of cache sectors, the tag area is used for storing corresponding thread identification and managed request address information, and the cache sectors are used for storing data information of synchronous requests. In a fourth aspect, an embodiment of the present invention provides a data processing apparatus, the apparatus including: the operation request receiving module is configured to receive an operation request and output a corresponding synchronous request and a data processing request; the main control module is configured to receive and analyze the data processing request and execute corresponding data operation; the synchronization request processing apparatus as described above. In a fifth aspect, an embodiment of the present invention provides a c