CN-122018978-A - Data processing method and device, electronic equipment and storage medium

CN122018978ACN 122018978 ACN122018978 ACN 122018978ACN-122018978-A

Abstract

The disclosure provides a data processing method, which comprises the steps of dividing data of a sparse matrix into a plurality of first data blocks, wherein the first data blocks comprise a first number of elements, the first number is determined according to the bit width of a register, marking and compressing zero elements in the plurality of first data blocks to obtain a plurality of second data blocks, each second data block comprises an index head and non-zero elements, the index head is used for representing position information of the zero elements and the non-zero elements in the first data blocks, loading the plurality of second data blocks into a memory, decompressing the plurality of second data blocks to obtain the plurality of first data blocks when a calculation task is executed, and dispatching the plurality of first data blocks into the register. The method and the device perform lossless compression on zero elements in the sparse matrix, reduce the waste of storage space and improve the storage efficiency.

Inventors

GAO WENHAI

Assignees

苏州亿铸智能科技有限公司

Dates

Publication Date: 20260512
Application Date: 20241112

Claims (15)

1. A method of data processing, comprising: dividing data of a sparse matrix into a plurality of first data blocks, the first data blocks comprising a first number of elements, wherein the first number is determined according to a bit width of a register; Marking and compressing zero elements in the first data blocks to obtain second data blocks, wherein each second data block comprises an index head and non-zero elements, and the index head is used for representing position information of the zero elements and the non-zero elements in the first data blocks; Loading a plurality of second data blocks into a memory; And when the computing task is executed, decompressing the plurality of second data blocks to obtain a plurality of first data blocks, and scheduling the plurality of first data blocks into a register.
2. The data processing method of claim 1, wherein marking and compressing zero elements in the first plurality of data blocks to obtain the second plurality of data blocks comprises: Identifying zero elements and non-zero elements in the first data block; Generating an index head according to the positions of the zero element and the non-zero element in the first data block, wherein each bit of the index head corresponds to an element in the first data block one by one and is used for indicating the position state of the corresponding element; judging whether the number of zero elements in the first data block is larger than a preset value or not; And when the number of zero elements in the first data block is larger than a preset value, combining the index head and the non-zero elements in the first data block into a second data block.
3. The data processing method of claim 2, wherein marking and compressing zero elements in the first plurality of data blocks to obtain the second plurality of data blocks comprises: And when the number of zero elements in the first data block is smaller than or equal to a preset value, not performing compression processing on the first data block, and combining the index head and the first data block into a second data block.
4. The data processing method according to claim 1, wherein decompressing the plurality of second data blocks comprises: and restoring the non-zero elements and the zero elements in the second data block to the corresponding positions of the sparse matrix according to the index head.
5. The data processing method according to claim 1, characterized by further comprising: dividing the plurality of second data blocks into a plurality of first data packets, each data packet comprising a second number of second data blocks, wherein the second number is determined based on a bit width of the register; Marking and compressing second data blocks of all zero elements in each first data packet to obtain a plurality of second data packets, wherein the second data packets comprise meta information and second data blocks of non-all zero elements, and the meta information is used for representing element types of each second data block in the second data packets; Loading a plurality of second data packets into a memory; And when the calculation task is executed, decompressing the plurality of second data packets to obtain a plurality of first data blocks, and scheduling the plurality of first data blocks into a register.
6. The data processing method of claim 5 wherein marking and compressing the second data blocks of all zero elements in each first data packet to obtain a plurality of second data packets comprises: identifying an element type of a second data block in the first data packet, wherein the element type comprises all elements, all zero elements and non-all zero elements; Generating meta information according to the element type of the second data block, wherein every two bits of the meta information correspond to each second data block in the first data packet and are used for indicating the element type of the corresponding second data block; compressing the second data blocks of all elements and all zero elements; The meta information, the second data blocks of all elements and non-all zero elements are combined into a second data packet.
7. The data processing method of claim 5, wherein decompressing the plurality of second data packets comprises: Decompressing the second data packet according to the meta information to recover the original second data block; and restoring the non-zero elements and the zero elements in the second data block to the corresponding positions of the sparse matrix according to the index head of the second data block.
8. The data processing method according to claim 1, characterized by further comprising: Determining the size of each second data block according to the index head of the second data block; accumulating the sizes of all the second data blocks before the second data block as offset addresses of the second data block; an address mapping table is formed for each second data block by creating a corresponding mapping entry, which includes the first offset address.
9. The data processing method of claim 8 wherein the non-zero elements in the second data block are closely arranged to form a data volume; The index head of the second data block is stored together with the data body, the source address points to the initial address of the index head of the first second data block, and the first offset address points to the initial address of the index head of the corresponding second data block; Or the index head of the second data block and the first offset address are stored together continuously, the data body is stored separately, the source address points to the starting address of the index head of the first second data block, and the first offset address points to the starting address of the data body of the corresponding second data block.
10. The data processing method according to claim 5, further comprising: determining the size of each second data packet according to the meta information of the second data packet and the index head of the second data block in the second data packet; accumulating the sizes of all the second data packets before the second data packets as offset addresses of the second data packets; An address mapping table is formed for each second data packet by creating a corresponding mapping entry, the mapping entry comprising at least a second offset address.
11. The data processing method of claim 10 wherein the non-zero elements in the second data block are closely arranged to form a data volume; The index head of the second data block is stored together with the data body, the source address points to the initial address of the meta information of the first second data packet, and the second offset address points to the initial address of the index head of the first second data block in each second data packet; or the source address points to the start address of the meta-information of the first second data packet, the second offset address points to the start address of the index header of the first second data block in each second data packet, and the first offset address points to the start address of the data body of each second data block in each second data packet.
12. A data processing device is characterized by comprising a compression unit, a scheduling unit, a memory, a decompression unit and a register, The compression unit is used for dividing data of the sparse matrix into a plurality of first data blocks, wherein the first data blocks comprise a first number of elements, and the first number is determined according to the bit width of the register; marking and compressing zero elements in the first data blocks to obtain second data blocks, wherein each second data block comprises an index head and non-zero elements, and the index head is used for representing position information of the zero elements and the non-zero elements in the first data blocks; The scheduling unit is used for loading the plurality of second data blocks into the memory and scheduling the plurality of second data blocks into the decompression unit when executing the computing task; The decompression unit is used for decompressing the plurality of second data blocks to obtain a plurality of first data blocks, and dispatching the plurality of first data blocks into the register.
13. The data processing apparatus of claim 11, wherein the compression unit is further configured to divide the plurality of second data blocks into a plurality of first data packets, each data packet including a second number of second data blocks, wherein the second number is determined based on a bit width of the register; marking and compressing second data blocks of all elements and all zero elements in each first data packet to obtain a plurality of second data packets, wherein the second data packets comprise meta information, the second data blocks of all elements and non-all zero elements, and the meta information is used for representing element types of the second data blocks; the scheduling unit is used for loading the plurality of second data packets into the memory and scheduling the plurality of second data packets into the decompression unit when performing the computing task; the decompression unit is used for decompressing the plurality of second data packets to obtain a plurality of first data blocks, and dispatching the plurality of first data blocks into the register.
14. An electronic device comprising a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for enabling a connection communication between the processor and the memory, the program when executed by the processor implementing the data processing method according to any one of claims 1 to 11.
15. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs, the one or more programs are executable by one or more processors to implement the data processing method of any of claims 1 to 11.

Description

Data processing method and device, electronic equipment and storage medium Technical Field The present disclosure relates to the field of artificial intelligence, and more particularly, to a data processing method and apparatus, an electronic device, and a storage medium. Background With the development of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) technology, deep learning models have been greatly successful in the fields of computer vision, natural language processing, and the like. However, as the scale of the deep learning model is larger and larger, the quantity of parameters and the calculation amount are larger and larger, and the use of the deep learning model on hardware platforms with limited resources, such as devices of an edge end, a mobile end and the like, is restricted. Therefore, it is necessary to compress the deep learning model, reduce the calculation and storage requirements, and accelerate the training and reasoning of the model. In the related art, compression processing may be performed on model parameters by pruning, quantization, and the like. The quantization is to convert the model parameters from floating point numbers to integers, however, the quantized matrix has sparsity, i.e. the quantized matrix includes a large number of zero elements, and the zero elements and the non-zero elements occupy the same memory resources, which can cause memory resource waste. Disclosure of Invention In view of the foregoing, an object of the present disclosure is to provide a data processing method and apparatus, an electronic device, and a storage medium, which reduce the waste of storage space by performing lossless compression on zero elements in a sparse matrix. According to a first aspect of the disclosure, a data processing method is provided, which includes dividing data of a sparse matrix into a plurality of first data blocks, wherein the first data blocks include a first number of elements, wherein the first number is determined according to a bit width of a register, marking and compressing zero elements in the plurality of first data blocks to obtain a plurality of second data blocks, each second data block includes an index head and a non-zero element, the index head is used for representing position information of the zero elements and the non-zero elements in the first data blocks, loading the plurality of second data blocks into a memory, decompressing the plurality of second data blocks to obtain the plurality of first data blocks when a calculation task is executed, and scheduling the plurality of first data blocks into the register. Optionally, marking and compressing zero elements in the first data blocks to obtain the second data blocks, wherein the marking and compressing of the zero elements in the first data blocks comprises the steps of identifying the zero elements and non-zero elements in the first data blocks, generating index heads according to positions of the zero elements and the non-zero elements in the first data blocks, wherein each bit of each index head corresponds to an element in the first data blocks one by one and is used for indicating a position state of the corresponding element, judging whether the number of the zero elements in the first data blocks is larger than a preset value, and combining the index heads and the non-zero elements in the first data blocks into the second data blocks when the number of the zero elements in the first data blocks is larger than the preset value. Optionally, marking and compressing zero elements in the first data blocks to obtain the second data blocks includes that when the number of zero elements in the first data blocks is smaller than or equal to a preset value, compression processing is not performed on the first data blocks, and the index head and the first data blocks are combined to form the second data blocks. Optionally, decompressing the plurality of second data blocks includes restoring non-zero elements and zero elements in the second data blocks to corresponding positions of the sparse matrix according to the index head. Optionally, the data processing method further comprises the steps of dividing a plurality of second data blocks into a plurality of first data packets, wherein each data packet comprises a second number of second data blocks, the second number is determined based on the bit width of a register, marking and compressing the second data blocks of all-zero elements in each first data packet to obtain a plurality of second data packets, the second data packets comprise meta-information and second data blocks of non-all-zero elements, the meta-information is used for representing the element type of each second data block in the second data packets, the plurality of second data packets are loaded into a memory, and when a calculation task is executed, the plurality of second data packets are decompressed to obtain a plurality of first data blocks, and the plurality of first data b