CN-121996170-A - Data processing method, device, electronic equipment and storage medium

CN121996170ACN 121996170 ACN121996170 ACN 121996170ACN-121996170-A

Abstract

The embodiment of the disclosure provides a data processing method, a data processing device, electronic equipment and a storage medium. The data processing method is applied to an artificial intelligent processor and comprises the steps of receiving a single target instruction, decoding the target instruction, calculating a corresponding target storage address based on the target base address and the target cursor address, and writing target data according to the target storage address, wherein the target instruction comprises a plurality of fields for target data, a target base address and a target cursor address. According to the data processing method, the hardware automatically calculates the storage address through a single instruction, so that the data processing performance is improved.

Inventors

Request for anonymity
Request for anonymity
Request for anonymity
Request for anonymity
Request for anonymity
Request for anonymity

Assignees

上海壁仞科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260209

Claims (12)

1. A data processing method applied to an artificial intelligence processor, the data processing method comprising: Receiving a single target instruction, wherein the target instruction comprises a plurality of fields for target data, a target base address and a target cursor address respectively; and decoding the target instruction, calculating a corresponding target storage address based on the target base address and the target vernier address, and writing the target data according to the target storage address.
2. The method of claim 1, wherein the artificial intelligence processor is configured to process multiple threads in parallel, The decoding the target instruction, calculating a corresponding target storage address based on the target base address and the target cursor address, includes: Analyzing the target instruction to obtain target data, a target base address and a target cursor address corresponding to each thread; Merging access requests corresponding to threads which point to the same target vernier address and have no memory bank access conflict into a first request, wherein the same target vernier address is the first vernier address, and And calculating a target storage address corresponding to each thread in the first request based on the target base address corresponding to each thread in the first request and the first cursor address.
3. The method of claim 2, wherein the calculating the target storage address for each of the threads in the first request based on the target base address and the first cursor address for each of the threads in the first request comprises: Reading and updating a cursor value corresponding to the first cursor address, wherein the read cursor value is the first cursor value; Assigning a different offset to each of the threads in the first request; and calculating a target storage address corresponding to each thread in the first request based on the target base address, the first cursor value and the offset.
4. The method of claim 3, wherein said assigning a different offset to each of said threads in said first request comprises: Acquiring the number N of threads in the first request, wherein N is a positive integer, and Each thread in the first request is assigned a different offset, wherein the offset is an integer between 0 and N-1.
5. The method of claim 4, wherein the reading and updating the cursor value corresponding to the first cursor address comprises: Reading a cursor value corresponding to the first cursor address as the first cursor value; adding the first cursor value and the number of threads to obtain a second cursor value, and And writing the second cursor value into the first cursor address.
6. The method of claim 3, wherein the calculating a target memory address for each of the threads in the first request based on the target base address, the first cursor value, and the offset comprises: and adding the target base address, the first cursor value and the offset corresponding to each thread to obtain the target storage address corresponding to each thread.
7. The method of claim 2, wherein writing the target data according to the target storage address comprises: And in response to the target storage address corresponding to each thread being located in different storage banks, merging and writing target data corresponding to each thread.
8. The method of any of claims 1-7, wherein the plurality of fields of the target instruction comprise: A first field for declaring an address space to which the target storage address belongs; A second field, which represents the total number of the target cursor address register and the temporary register corresponding to the target cursor address; a third field representing the target cursor address register, and And at least one fourth field, which represents the temporary storage register, wherein the temporary storage register comprises a target base address register corresponding to the target base address and a target data register corresponding to the target data.
9. The method of claim 8, wherein the at least one fourth field comprises: a fifth field representing the target base address register, and And a sixth field representing the target data register.
10. A data processing apparatus, characterized in that the data processing apparatus comprises: An instruction fetch unit configured to receive a single target instruction, wherein the target instruction includes a plurality of fields for target data, a target base address, and a target cursor address, respectively; an instruction decoding unit configured to decode the target instruction, and And the loading storage unit is configured to calculate a corresponding target storage address based on the target base address and the target vernier address, and write the target data according to the target storage address.
11. An electronic device, the electronic device comprising: at least one processor; at least one memory including one or more computer program modules; Wherein the one or more computer program modules are stored in the at least one memory and configured to be executed by the at least one processor, the one or more computer program modules being for implementing the data processing method of any of claims 1-9.
12. A non-transitory computer readable storage medium having stored thereon computer instructions, which when executed by at least one processor perform the data processing method of any of claims 1-9.

Description

Data processing method, device, electronic equipment and storage medium Technical Field Embodiments of the present disclosure relate to a data processing method, apparatus, electronic device, and storage medium. Background In parallel computing and high performance data processing, how to efficiently and correctly aggregate intermediate results generated by multiple threads or processors into a shared memory space is a critical issue. For example, when implementing Top-k selection and other algorithms, the efficiency and scalability of data aggregation directly determine overall performance. To coordinate parallel writing, avoid data contention, a two-step operation of "dynamic offset allocation-collision-free writing" is generally employed. However, this mode has increasingly highlighted the inherent performance bottlenecks when dealing with modern massively parallel hardware architectures, which become an important factor limiting system throughput and energy efficiency. Disclosure of Invention At least one embodiment of the present disclosure provides a data processing method applied to an artificial intelligence processor, wherein the data processing method includes receiving a single target instruction, wherein the target instruction includes a plurality of fields for target data, a target base address and a target cursor address, decoding the target instruction, calculating a corresponding target storage address based on the target base address and the target cursor address, and writing the target data according to the target storage address. In at least one embodiment of the present disclosure, a data processing method is provided, where the artificial intelligence processor is configured to process a plurality of threads in parallel, decode the target instruction, calculate a corresponding target storage address based on the target base address and the target cursor address, and include resolving the target instruction to obtain target data, a target base address, and a target cursor address corresponding to each of the threads, merging access requests corresponding to threads that point to the same target cursor address and have no bank access conflict into a first request, where the same target cursor address is a first cursor address, and calculate a target storage address corresponding to each of the threads in the first request based on the target base address and the first cursor address corresponding to each of the threads in the first request. In at least one embodiment of the present disclosure, a data processing method is provided, where the calculating, based on a target base address and the first cursor address corresponding to each thread in a first request, a target storage address corresponding to each thread in the first request includes reading and updating a cursor value corresponding to the first cursor address, where the read cursor value is a first cursor value, allocating a different offset to each thread in the first request, and calculating, based on the target base address, the first cursor value, and the offset, a target storage address corresponding to each thread in the first request. In at least one embodiment of the present disclosure, a method for data processing is provided, where the allocating a different offset to each thread in the first request includes obtaining a number of threads N in the first request, where N is a positive integer, and allocating a different offset to each thread in the first request, where the offset is an integer between 0 and N-1. In at least one embodiment of the present disclosure, a method for processing data is provided, where the reading and updating a cursor value corresponding to the first cursor address includes reading the cursor value corresponding to the first cursor address as the first cursor value, adding the first cursor value and the number of threads to obtain a second cursor value, and writing the second cursor value into the first cursor address. In at least one embodiment of the present disclosure, a data processing method is provided, where calculating, based on the target base address, the first cursor value, and the offset, a target storage address corresponding to each thread in the first request includes adding the target base address, the first cursor value, and the offset corresponding to each thread to obtain the target storage address corresponding to each thread. In at least one embodiment of the present disclosure, the writing the target data according to the target storage address includes merging and writing the target data corresponding to each thread in response to the target storage address corresponding to each thread being located in a different memory bank. In at least one embodiment of the present disclosure, the plurality of fields of the target instruction include a first field for declaring an address space to which the target storage address belongs, a second field for indicating a total number