CN-121858065-B - Data processing method and device based on multiply-accumulate operation module
Abstract
The invention relates to the technical field of multiply-accumulate operation and discloses a data processing method and device based on a multiply-accumulate operation module. The method comprises the steps of obtaining mapping data after reversible mapping of weight vectors and activation vectors, processing the mapping data to obtain a packaging result, performing bit slicing operation on the packaging result to enable the data to be suitable for matrix multiplication and array calculation, finally performing residual compensation and symbol recovery to obtain a lossless recovery result, converting original data into deformed data through reversible mapping, packaging, compensation and recovery of the data, restoring the deformed data into segmented data under original semantics, enabling the restored segmented data to be completely consistent with an original operation result without any preprocessing or optimization, identifying different computing resources, performing preprocessing when the computing resources are sufficient, prohibiting preprocessing when the computing resources are insufficient, and directly operating, and enabling different models to be simultaneously adapted while improving model throughput.
Inventors
- GONG LEI
- WU JUNDONG
- WANG ZHIGUANG
- WANG CHAO
- WANG TENG
- LOU WENQI
- ZHOU XUEHAI
Assignees
- 中国科学技术大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260318
Claims (10)
- 1. A data processing method based on a multiply-accumulate operation module, the method comprising: Determining a bit width upper limit value corresponding to the multiply-accumulate operation module, and obtaining a parallelism feasible region corresponding to the target system according to the bit width upper limit value; the parallel degree feasible domain is used for grouping and aligning the weight data and the activation data to obtain a weight vector and an activation vector; Reversible mapping is carried out on the weight vector and the activation vector, and mapping data are obtained; When the preset preprocessing compensation is allowed to be used, obtaining a first packing word according to the mapping data; when the use of the preset preprocessing compensation is forbidden, obtaining a second packing word according to the mapping data; Performing multiply-add operation on the first packed word to obtain a first packed result, obtaining a compensation result according to the first packed result, and performing multiply-add operation on the second packed word to obtain a second packed result; Performing bit slicing operation on the first packing result and the compensation result or the second packing result to obtain a first sub-result set corresponding to the first packing result and the compensation result or a second sub-result set corresponding to the second packing result; And carrying out residual compensation and symbol recovery on the first sub-result set or carrying out symbol recovery on the second sub-result set to obtain lossless recovery data.
- 2. The method for processing data based on a multiply-accumulate operation module according to claim 1, wherein the grouping and aligning the weight data and the activation data using the parallelism enable field to obtain a weight vector and an activation vector comprises: determining the parallelism corresponding to the weight vector and the activation vector according to the parallelism feasible domain, wherein the parallelism is used for measuring the number of the weight-activation element pairs processed simultaneously; Grouping the weight data and the activation data according to the parallelism to obtain grouping data, combining weight elements corresponding to the weight data in the grouping data to obtain weight vectors; Wherein the weight vectors and the activation vectors are in one-to-one correspondence in time sequence and position.
- 3. The method for processing data based on a multiply-accumulate operation module according to claim 2, wherein the performing reversible mapping on the weight vector and the activation vector to obtain mapping data includes: determining the data characteristics of the weight vector and the activation vector, and obtaining a first control mark and a data effective mark according to the data characteristics; Performing data optimization on the weight vector and the activation vector to obtain a standard optimization vector; the method comprises the steps of obtaining parameter data generated by generating a standard optimization vector, loading the parameter data into a first control mark to obtain a second control mark, wherein the second control mark is used for reversely calculating and restoring original data; and obtaining mapping data according to the standard optimization vector and the second control mark.
- 4. A method for processing data based on a multiply-accumulate operation module according to claim 3, wherein said obtaining a first packet word according to the mapping data comprises: Performing a first bit section allocation operation on all elements in the mapping data to obtain bit sections corresponding to all the elements; And obtaining a compensation result according to the first packing result, including: Determining a part consistent with the bit section layout of the first packing result aiming at the compensation item generated by the reversible mapping to obtain a compensation sub-item; the first bit section allocation operation and the second bit section allocation operation correspond to the same operation parameters, the first zero insertion isolation bit operation and the second zero insertion isolation bit operation correspond to the same operation parameters, and the operation parameters are used for controlling the corresponding operation process.
- 5. The method of claim 4, wherein performing a bit slicing operation on the first packed result and the compensation result or the second packed result to obtain a first sub-result set corresponding to the first packed result and the compensation result or a second sub-result set corresponding to the second packed result, comprises: Splitting the first packing result and the compensating result or the second packing result into a plurality of first sub-results corresponding to the first packing result and the compensating result or a plurality of second sub-results corresponding to the second packing result according to a preset bit segment boundary; And obtaining a first sub-result set or a second sub-result set according to all the first sub-results or all the second sub-results.
- 6. The method for processing data based on a multiply-accumulate operation module according to claim 5, wherein performing residual compensation and sign recovery on the first sub-result set to obtain lossless recovery data comprises: and performing residual compensation on the first sub-result set and combining the second control mark to obtain a compensation set, and recovering the compensation set through symbols to obtain lossless recovery data.
- 7. The multiply-accumulate operation module-based data processing method of claim 6, further comprising: When the input value reaches a preset minimum data threshold value, carrying out equivalent recovery or constant shift reconstruction calculation overflow compensation on the numerical value deviation introduced in the reversible mapping to obtain first recovery data, wherein the first recovery data comprises recovery data of an extreme value and the lossless recovery data; When the input value reaches a preset highest data threshold, an unsigned expansion or special recovery algorithm is triggered to eliminate the numerical deviation introduced in the reversible mapping.
- 8. The multiply-accumulate operation module-based data processing method of claim 7, further comprising: determining the number of inserted delay beats according to the data valid mark and the second control mark based on a preset time sequence alignment algorithm; And according to the delay beats, aligning the data valid mark and the second control mark by using a preset pipeline registering algorithm, so that the data valid mark and the second control mark propagate in the same rhythm.
- 9. A data processing apparatus based on a multiply-accumulate operation module, the apparatus comprising: The system comprises a mapping module, a multiplication and accumulation operation module, a weight vector and an activation vector, wherein the mapping module is used for determining a bit width upper limit value corresponding to the multiplication and accumulation operation module, obtaining a parallelism feasible region corresponding to a target system according to the bit width upper limit value, obtaining weight data and activation data to be processed, grouping and aligning the weight data and the activation data by using the parallelism feasible region to obtain the weight vector and the activation vector, and carrying out reversible mapping on the weight vector and the activation vector to obtain mapping data; the packaging module is used for obtaining a first packaging word according to the mapping data when the preset preprocessing compensation is allowed to be used, and obtaining a second packaging word according to the mapping data when the preset preprocessing compensation is forbidden to be used; The multiplication and addition module is used for performing multiplication and addition operation on the first packed word to obtain a first packed result, obtaining a compensation result according to the first packed result, and performing multiplication and addition operation on the second packed word to obtain a second packed result; The unpacking module is used for executing bit slicing operation on the first packing result and the compensation result or the second packing result to obtain a first sub-result set corresponding to the first packing result and the compensation result or a second sub-result set corresponding to the second packing result; And the recovery module is used for carrying out residual compensation and symbol recovery on the first sub-result set or carrying out symbol recovery on the second sub-result set to obtain lossless recovery data.
- 10. An apparatus comprising a memory and a processor, the apparatus comprising: a memory storing executable program code; a processor coupled to the memory; the processor invokes the executable program code stored in the memory to perform the multiply-accumulate operation module-based data processing method of any one of claims 1-8.
Description
Data processing method and device based on multiply-accumulate operation module Technical Field The present invention relates to the field of multiply-accumulate operation, and in particular, to a data processing method and apparatus based on a multiply-accumulate operation module. Background The multiply-accumulate operation module is a hardware unit specially designed for multiply-accumulate operation, and if the calculation capability of the multiply-accumulate operation module can be fully utilized, the throughput and the energy efficiency of the accelerator can be obviously improved. The multiply-accumulate operation module may be integrated on any hardware facility, for example, it is integrated as a DSP module on an FPGA. The multiply-accumulate operation is a hardware-optimized combined operation, and can complete the operations of multiplication and addition by one hardware operation. The hardware facility integrated with the multiply-accumulate operation module can be used for processing various visual, voice and natural language data, when the hardware facility is used, the data to be processed are often set to be weight data and activation data, wherein the weight data and the activation data can be signed or unsigned fixed-point integers, for example, the natural language data are divided into weight data and activation data according to a specific algorithm, then the weight data and the activation data are input into the hardware facility integrated with the multiply-accumulate operation module for processing, for example, a trained neural network model is carried in the hardware facility, and the high-efficiency processing of the data is realized according to the neural network model. The port of the multiply-accumulate operation module has a fixed bit width, however, as the bit widths of the weight data and the activation data continuously decrease, the port bit width of the multiply-accumulate operation module is not matched with the bit widths of the weight data and the activation data. The mismatch can cause the problem that when the weight data and the activation data with the unmatched bit widths are directly input to the multiply-accumulate operation module, some idle data can appear to cause the low utilization rate of the multiply-accumulate operation module, even part of computing resources are used for bearing low bit width operation by using a lookup table or a memory, and further the idle of the resources of the multiply-accumulate operation module is caused. When the multiply-accumulate operation module resource is idle, the throughput becomes low, i.e. the effective data processing amount per unit time becomes low. On the other hand, the hardware facility can accurately process data only when the computing resources are sufficient, and when the resources are limited or the basic matrix multiplication is only needed to operate, the resource waste and the data stream destruction can be caused, so that the computing result is inaccurate; Therefore, a new method is needed to improve the resource utilization rate of the multiply-accumulate operation module and the adaptability of the multiply-accumulate operation module to different computing systems. Disclosure of Invention The invention provides a data processing method and device based on a multiply-accumulate operation module, which can improve the resource utilization rate of the multiply-accumulate operation module and the suitability of the multiply-accumulate operation module to different computing systems. The first aspect of the invention discloses a data processing method based on a multiply-accumulate operation module, which comprises the following steps: Determining a bit width upper limit value corresponding to the multiply-accumulate operation module, and obtaining a parallelism feasible region corresponding to the target system according to the bit width upper limit value; the parallel degree feasible domain is used for grouping and aligning the weight data and the activation data to obtain a weight vector and an activation vector; Reversible mapping is carried out on the weight vector and the activation vector, and mapping data are obtained; When the preset preprocessing compensation is allowed to be used, obtaining a first packing word according to the mapping data; when the use of the preset preprocessing compensation is forbidden, obtaining a second packing word according to the mapping data; Performing multiply-add operation on the first packed word to obtain a first packed result, obtaining a compensation result according to the first packed result, and performing multiply-add operation on the second packed word to obtain a second packed result; Performing bit slicing operation on the first packing result and the compensation result or the second packing result to obtain a first sub-result set corresponding to the first packing result and the compensation result or a second sub-result set corresponding to the s