Search

CN-121996614-A - Processor and memory access method

CN121996614ACN 121996614 ACN121996614 ACN 121996614ACN-121996614-A

Abstract

The application relates to a processor and a memory access method. The processor comprises a UAV instruction sending unit, a UAV register management unit, a general register unit, a UAV data cache unit and a loading storage unit, wherein the loading storage unit is configured to receive a SIMD instruction, read resource information of a target UAV register from the UAV register management unit according to address information of the target UAV register, read channel logic address information of each channel from the general register unit when the SIMD instruction is determined to be a loading instruction according to an operation code, calculate memory access addresses of a first channel when the memory addresses to be accessed by a plurality of channels of the SIMD instruction are judged to be continuous, calculate memory access addresses of other channels based on the memory access addresses of the first channel, and generate a memory access request based on the memory access addresses of the channels. The application improves the throughput rate of the loading and storing instructions with continuous addresses, saves the number of the memory address calculation ALUs and reduces the power consumption.

Inventors

  • LI LEI
  • WU FENGXIA
  • ZHANG HUAISHENG

Assignees

  • 格兰菲智能科技股份有限公司

Dates

Publication Date
20260508
Application Date
20260119
Priority Date
20260113

Claims (13)

  1. 1. A processor, comprising a UAV instruction sending unit, a UAV register management unit, a general purpose register unit, a UAV data cache unit, and a load storage unit, wherein the UAV instruction sending unit, the UAV register management unit, the general purpose register unit, and the UAV data cache unit are all connected to the load storage unit; the load store unit is configured to: receiving a SIMD instruction, the SIMD instruction being sent by the UAV instruction sending unit; Analyzing and obtaining the operation code of the SIMD instruction, the address information of the target general purpose register and the address information of the target UAV register; Reading resource information of the target UAV register from the UAV register management unit according to the address information of the target UAV register; Under the condition that the SIMD instruction is determined to be a loading instruction according to the operation code, reading channel logic address information of each channel of the SIMD instruction from the general register unit according to the address information of the target general register; Judging whether memory addresses to be accessed by a plurality of channels of the SIMD instruction are continuous or not based on the resource information of the target UAV register and the channel logic address; under the condition that the memory addresses to be accessed by a plurality of channels of the SIMD instruction are judged to be continuous, calculating the memory access addresses of a first channel in the plurality of channels, and calculating the memory access addresses of the rest channels except the first channel in the plurality of channels based on the memory access addresses of the first channel; And generating a memory access request based on the memory access address of each channel of the SIMD instruction so as to finish the access to the memory through the UAV data caching unit.
  2. 2. The processor of claim 1, wherein the load store unit is further configured to read channel logic address information for each channel of the SIMD instruction and data to be stored from the general purpose register unit based on address information of the target general purpose register if the SIMD instruction is determined to be a store instruction based on the opcode.
  3. 3. The processor of claim 1, wherein the resource description information of the target UAV register includes a base address, a resource type, a resource size, a data layout, a data format, and a resource dimension.
  4. 4. The processor of claim 3, wherein the load store unit comprises an address continuation detection unit configured to determine whether memory addresses to be accessed by multiple lanes of the SIMD instruction are consecutive based on resource information of the target UAV register and the lane logical address by: judging whether the resource information of the target UAV register is valid or not according to the valid bit in the target UAV register; Judging whether a plurality of channel instruction access addresses of the SIMD instruction are continuous or not based on the channel logic address under the condition that the resource information of the target UAV register is judged to be effective, wherein the channel instruction access addresses of the plurality of channels are the channel logic addresses of the plurality of channels; And under the condition that the instruction access addresses among different channels of the SIMD instruction are judged to be continuous, judging whether the memory addresses to be accessed by a plurality of channels of the SIMD instruction are continuous or not based on the resource information of the target UAV register and the data format of the SIMD instruction.
  5. 5. The processor of claim 4, wherein the lane logical addresses comprise a first dimension logical address, a second dimension logical address, and a third dimension logical address, the address continuation detection unit configured to determine whether a plurality of lane instruction access addresses of the SIMD instruction are consecutive based on the lane logical addresses by: Under the condition that the data layout in the resource information of the target UAV register is a linear layout, if the second dimension and the third dimension logic addresses of the multiple channels of the SIMD instruction are equal and the first dimension logic addresses of the multiple channels of the SIMD instruction form a continuous sequence, judging that the multiple channel instruction access addresses of the SIMD instruction are continuous; And under the condition that the data layout in the resource information of the target UAV register is Zig-zag, if the third dimension logic addresses of the multiple channels of the SIMD instruction are equal and the arrangement of the first dimension logic addresses and the second dimension logic addresses of the multiple channels of the SIMD instruction are matched with a predefined Zig-zag mode, judging that the multiple channel instruction access addresses of the SIMD instruction are continuous.
  6. 6. The processor of claim 4, wherein the address continuation detection unit is configured to determine whether memory addresses to be accessed by multiple channels of the SIMD instruction are consecutive based on resource information of the target UAV register and a data format of the SIMD instruction by: if the resource type is buffer area resource, if the multiple channel instruction access addresses of the SIMD instruction are continuous, judging that the memory addresses to be accessed by the multiple channels of the SIMD instruction are continuous; And if the resource type is texture resource, determining that memory addresses to be accessed by a plurality of channels of the SIMD instruction are continuous if the bit of the data channel mask of the SIMD instruction and the channel mask of the data format of the target UAV register are bit-wise and the result is equal to the channel mask of the data format of the target UAV register.
  7. 7. A processor according to claim 3, wherein the load store unit comprises a memory address calculation unit configured to calculate the memory access address of a first channel of the plurality of channels by: And calculating the memory access address of the first channel according to the base address, the dimension information and the data layout in the target UAV register and the channel logic address information of the first channel read from the general register unit.
  8. 8. The processor according to claim 7, wherein the memory address calculation unit is configured to calculate the memory access addresses of the remaining channels of the plurality of channels other than the first channel based on the memory access address of the first channel by: determining the address offset of the rest channels according to the data format and the channel number of the SIMD instruction; Determining the access direction of the address according to the address continuous detection result; And calculating the memory access addresses of the rest channels based on the memory access address of the first channel, the address offset of the rest channels and the access direction of the address.
  9. 9. The processor of claim 3, wherein the load store unit is further configured to, in the event that it is determined that the memory addresses to be accessed by the multiple channels of the SIMD instruction are not consecutive, calculate the memory access address for each channel by: Obtaining a channel logic address of each channel; And converting the channel logic address of each channel into a corresponding memory access address based on the base address, the resource dimension, the data layout and the data format of the target UAV register.
  10. 10. The processor of claim 1, wherein the load store unit comprises an address out-of-range detection unit configured to: And detecting out-of-range memory access addresses corresponding to all channels of the SIMD instruction, and filtering out-of-range channels.
  11. 11. The processor of claim 10, wherein the load store unit comprises a memory access request generation unit configured to: Generating a memory access request based on a memory access address of a non-out-of-range channel in the SIMD instruction; and sending the memory access request to the UAV data caching unit to finish the access to the memory.
  12. 12. The processor of claim 1, wherein the load store unit further comprises a return data unit configured to: And writing the data returned from the UAV data cache unit into the general register unit.
  13. 13. A memory access method applied to a processor, wherein the processor includes a UAV instruction sending unit, a UAV register management unit, a general purpose register unit, a UAV data cache unit, and a load storage unit, wherein the UAV instruction sending unit, the UAV register management unit, the general purpose register unit, and the UAV data cache unit are all connected to the load storage unit, the method comprising: transmitting a SIMD instruction via the UAV instruction transmitting unit; receiving SIMD instructions via the load store unit; Analyzing and obtaining the operation code of the SIMD instruction, the address information of the target general purpose register and the address information of the target UAV register through the loading and storing unit; Reading resource information of the target UAV register from the UAV register management unit according to address information of the target UAV register via the load storage unit; Reading channel logic address information of each channel of the SIMD instruction from the general register unit according to the address information of the target general register under the condition that the SIMD instruction is determined to be a loading instruction according to the operation code through the loading storage unit; Judging whether memory addresses to be accessed by a plurality of channels of the SIMD instruction are continuous or not based on the resource information of the target UAV register and the channel logic address through the loading storage unit; Under the condition that the memory addresses to be accessed by a plurality of channels of the SIMD instruction are judged to be continuous, calculating the memory access address of a first channel in the plurality of channels through the loading storage unit, and calculating the memory access addresses of the rest channels except the first channel in the plurality of channels based on the memory access address of the first channel; And generating a memory access request based on the memory access address of each channel of the SIMD instruction through the load storage unit so as to finish the access to the memory through the UAV data caching unit.

Description

Processor and memory access method Technical Field The present application relates to the field of processors, and in particular, to a processor and a memory access method. Background In a graphics processor (Graphics Processing Unit, GPU) or General-purpose graphics processor (GPGPU), the instruction types employed in the programmable units are typically single instruction multiple data types (Single Instruction Multiple Data, SIMD), with access addresses between different channels being independent of each other for load store instructions. The unordered access view (Unordered ACCESS VIEW, UAV) supports random reading and writing of data resources such as textures, buffers and the like through loading and storing instructions among different threads without strictly accessing in sequence, so that the unordered access view is widely applied to scenes such as scientific calculation, physical simulation, artificial intelligence reasoning and the like. For example, in an artificial intelligence reasoning task, massive matrix multiplication operation is required, an application program loads an input feature matrix or a weight matrix through a UAV, and writes multiplication results into a cache or a memory, and the word output speed of the reasoning task is often directly affected by the performance of a load storage instruction. Thus, improving the throughput of load and store instructions for out-of-order access views is critical to the performance of the hardware. However, since the processing unit of the existing graphics processor load store instruction independently calculates the memory addresses accessed by different channels of SIMD instructions through several arithmetic logic units (ARITHMETIC LOGIC UNIT, ALUs), the throughput of the load store instruction is in most cases limited by the number of ALUs, i.e. the ALUs become performance bottlenecks. Therefore, how to increase the throughput of the load store instruction without increasing the number of ALUs for additional memory addresses is important for optimizing hardware architecture design, reducing hardware cost, and achieving high performance computation. Disclosure of Invention The invention provides a processor and a memory access method, which aims to solve the technical problem that in the prior art, the graphics processor has poor performance because the throughput rate of a loading and storing instruction is limited by the number of ALUs. According to the technical scheme, under the condition that the load storage unit judges that the memory addresses to be accessed by the multiple channels of the SIMD instruction are continuous, the memory access addresses of the first channel in the multiple channels are calculated, and the memory access addresses of the rest channels except the first channel in the multiple channels are calculated based on the memory access addresses of the first channel, so that for the load storage instruction with continuous addresses, the throughput rate of the load storage instruction with continuous addresses is improved, the number of ALUs calculated by the memory addresses is saved, and meanwhile, the power consumption is reduced. In order to achieve the above object, a first aspect of the present invention provides a processor comprising a UAV instruction transmitting unit, a UAV register management unit, a general purpose register unit, a UAV data caching unit, and a load storage unit, wherein the UAV instruction transmitting unit, the UAV register management unit, the general purpose register unit, and the UAV data caching unit are all connected to the load storage unit; the load store unit is configured to: receiving a SIMD instruction, the SIMD instruction being sent by the UAV instruction sending unit; Analyzing and obtaining the operation code of the SIMD instruction, the address information of the target general purpose register and the address information of the target UAV register; Reading resource information of the target UAV register from the UAV register management unit according to the address information of the target UAV register; Under the condition that the SIMD instruction is determined to be a loading instruction according to the operation code, reading channel logic address information of each channel of the SIMD instruction from the general register unit according to the address information of the target general register; Judging whether memory addresses to be accessed by a plurality of channels of the SIMD instruction are continuous or not based on the resource information of the target UAV register and the channel logic address; under the condition that the memory addresses to be accessed by a plurality of channels of the SIMD instruction are judged to be continuous, calculating the memory access addresses of a first channel in the plurality of channels, and calculating the memory access addresses of the rest channels except the first channel in the plurality of channels based on the m