CN-121979581-A - Sparse identification scheduling method, device, equipment and medium

CN121979581ACN 121979581 ACN121979581 ACN 121979581ACN-121979581-A

Abstract

The sparse identification scheduling method, device, equipment and medium have the advantages that the sparse matrix is divided into a plurality of identification blocks by taking the preset data length as a unit, the identification blocks are judged to be sparse when all elements in the identification blocks are zero, the identification blocks are judged to be valid when at least one non-zero element exists in the identification blocks, the judgment result is continuously stored in a memory space independent of the sparse matrix by taking a bit as a unit to form sparse mask data, the sparse matrix and the sparse mask data are loaded into a register file by taking a main block as a unit, and the register file comprises a header register and a data register and is used for executing sparse matrix and weight matrix multiplication calculation. According to the invention, the sparse blocks can be efficiently judged with uniform granularity under various data formats, and the sparse bitmap and the reconstruction parameters are jointly encoded, so that the dispatching efficiency of sparse activation is improved, and the cost of storage and bandwidth is reduced.

Inventors

LI HAORAN
WANG HONGYI
ZHOU YONGQUAN
WANG YANHONG
HONG HAO

Assignees

上海紫荆芯界智能科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260408

Claims (10)

1. The sparse identification scheduling method is characterized by comprising the following steps of: Acquiring a sparse matrix, wherein the sparse matrix is continuously arranged in a memory by taking a combined block as a unit, the combined block comprises a plurality of main blocks, and the main blocks comprise a plurality of sub-blocks; dividing the sparse matrix into a plurality of recognition blocks by taking a preset data length as a unit, judging the recognition blocks as sparse when all elements in the recognition blocks are zero, judging the recognition blocks as valid when at least one non-zero element exists in the recognition blocks, and continuously storing a judgment result in a memory space independent of the sparse matrix by taking a bit as a unit to form sparse mask data; loading the sparse matrix and the sparse mask data into a register file in units of a main block, the register file including a header register and a data register for performing sparse matrix and weight matrix multiplication calculations, comprising: When the sparse matrix storage data only comprises a data part, loading the data part into a data register in sequence by taking a sub-block as a unit, dividing a corresponding head storage area for each sub-block in a head register, and loading the sparse mask data of each sub-block into a corresponding head storage area in sequence; when the sparse matrix storage data comprises a head part and a data part, the data part is sequentially loaded into a data register by taking a sub-block as a unit, a corresponding head storage area is divided for each sub-block in the head register, and the head part and the sparse mask data corresponding to the sub-block are respectively loaded after the head storage area is partitioned.
2. The sparse identification scheduling method of claim 1, wherein the dividing the sparse matrix into a plurality of identification blocks in units of a preset data length comprises: when the element bit width of the sparse matrix storage data is a power of 2, presetting a first data length to be 1 multiplied by 32 bytes, and dividing the sparse matrix into a plurality of identification blocks according to the first data length; when the bit width of the element of the sparse matrix storage data is not a power of 2, presetting the second data length to be 1×48 bytes, and dividing the sparse matrix into a plurality of identification blocks according to the second preset data length.
3. The sparse identification scheduling method of claim 1, wherein when the sparse matrix storage data includes only data portions, loading the data portions into the data register in units of sub-blocks in turn, dividing a corresponding header storage area in the header register for each sub-block, and loading the sparse mask data of each sub-block into the corresponding header storage area in turn, comprising: the sparse mask data of the sub-block is preferentially loaded to the high-order bits of the corresponding header memory area, and the remaining bits of the header memory area are left empty.
4. The sparse identification scheduling method of claim 1, wherein when the sparse matrix storage data includes a header portion and a data portion, loading the data portion into the data register sequentially in units of sub-blocks, dividing the header register into a corresponding header storage area for each sub-block, and loading the header portion and the sparse mask data corresponding to the sub-block after the header storage area is divided, respectively, includes: the sparse mask data of the sub-block is preferentially loaded to the high bit portion of the corresponding header memory area, the head portion of the sub-block is preferentially loaded to the low bit portion of the corresponding header memory area, and the remaining bits of the header memory area are left empty.
5. A sparse identification scheduling device, comprising: The acquisition module is used for acquiring a sparse matrix, wherein the sparse matrix is continuously arranged in a memory by taking a combined block as a unit, the combined block comprises a plurality of main blocks, and the main blocks comprise a plurality of sub-blocks; the device comprises a judging module, a judging module and a memory module, wherein the judging module is used for dividing a sparse matrix into a plurality of identification blocks by taking a preset data length as a unit, judging the identification blocks as sparse when all elements in the identification blocks are zero, judging the identification blocks as valid when at least one non-zero element exists in the identification blocks, and continuously storing a judging result in a memory space independent of the sparse matrix by taking a bit as a unit to form sparse mask data; A loading module for loading the sparse matrix and the sparse mask data into a register file in units of main blocks, the register file including a header register and a data register for performing sparse matrix and weight matrix multiplication calculations, comprising: When the sparse matrix storage data only comprises a data part, loading the data part into a data register in sequence by taking a sub-block as a unit, dividing a corresponding head storage area for each sub-block in a head register, and loading the sparse mask data of each sub-block into a corresponding head storage area in sequence; when the sparse matrix storage data comprises a head part and a data part, the data part is sequentially loaded into a data register by taking a sub-block as a unit, a corresponding head storage area is divided for each sub-block in the head register, and the head part and the sparse mask data corresponding to the sub-block are respectively loaded after the head storage area is partitioned.
6. The sparse identification scheduling device of claim 5, wherein the determination module divides the sparse matrix into a plurality of identification blocks in units of a preset data length, comprising: when the element bit width of the sparse matrix storage data is a power of 2, presetting a first data length to be 1 multiplied by 32 bytes, and dividing the sparse matrix into a plurality of identification blocks according to the first data length; when the bit width of the element of the sparse matrix storage data is not a power of 2, presetting the second data length to be 1×48 bytes, and dividing the sparse matrix into a plurality of identification blocks according to the second preset data length.
7. The sparse identification scheduling apparatus of claim 5, wherein the loading module, when the sparse matrix storage data includes only the data portion, sequentially loads the data portion in units of sub-blocks into the data register, divides the corresponding header storage area for each sub-block in the header register, sequentially loads the sparse mask data of each sub-block into the corresponding header storage area, comprising: the sparse mask data of the sub-block is preferentially loaded to the high-order bits of the corresponding header memory area, and the remaining bits of the header memory area are left empty.
8. The sparse identification scheduling apparatus of claim 5, wherein the loading module, when the sparse matrix storage data includes a header portion and a data portion, sequentially loads the data portion into the data register in units of sub-blocks, divides the header register into respective header storage areas for each sub-block, and loads the header portion and the sparse mask data corresponding to the sub-block after the header storage areas are divided, respectively, comprises: the sparse mask data of the sub-block is preferentially loaded to the high bit portion of the corresponding header memory area, the head portion of the sub-block is preferentially loaded to the low bit portion of the corresponding header memory area, and the remaining bits of the header memory area are left empty.
9. An electronic device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the sparse identification scheduling method of any one of claims 1 to 4 when the computer program is executed.
10. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, code set, or instruction set being loaded and executed by a processor to implement the steps of the sparse identification scheduling method of any one of claims 1 to 4.

Description

Sparse identification scheduling method, device, equipment and medium Technical Field The invention relates to the field of computer architecture, in particular to a sparse identification scheduling method, device, equipment and medium. Background Along with the continuous expansion of the scale of the deep neural network, the model calculation and storage cost is obviously increased, and the sparsification accelerating technology becomes an important direction for improving the operation efficiency. The current mainstream research and application focus on weight sparsity, especially structured sparse pruning, and acceleration support, such as 2:4 sparsity mechanism of an Injeida GPU, has been realized on a plurality of hardware platforms. The Injeida GPU has introduced native support for 2:4 structured weight sparseness in its Ampere architecture. This mechanism requires that only two non-zero values (i.e., 2 valid weights, 2 zeroed out) be retained in every four consecutive weight elements, forming a structured sparse block of fixed pattern. The sparsity requirement weight matrix is generated by forced constraint of a pruning algorithm, and zero value operation can be skipped efficiently in the hardware operation process. Although 2:4 structured sparsity can achieve a degree of acceleration of reasoning with hardware support, this approach has several limitations. First, the sparsity is fixed to be 50%, namely, 2 weight elements in every 4 weight elements are non-zero, and the self-adaptive adjustment capability of the self-sparse distribution of the model is lacking. Second, the technique is only applicable to static weight tensors, and does not exploit the widely-occurring sparsity of activation in the reasoning process. The activation sparsity is typically dynamically generated by input drives, which are not fixed in location, difficult to identify and accelerate by current 2:4 mechanisms. In contrast, active sparsity, despite its widespread presence in nonlinear unit outputs such as ReLU, is limited by its dynamics and irregularities, and currently lacks efficient hardware support and generic acceleration schemes. The potential computation savings involved in activating sparsity have not been fully exploited, becoming a weak link in existing sparse computing systems. ProSparse proposes a method of converting a language model currently widely used with non-ReLU activation functions (such as Swish or GELU) into ReLU functions with high activation sparsity, which is further enhanced by designing a loss function. The method generates the sparse activation matrix of the element level, ensures the precision and realizes better reasoning acceleration. Although ProSparse achieves significant results in terms of improving the sparsity of activation, its overall strategy still has certain limitations. Firstly, proSparse is mainly based on unstructured sparse mode of single element level, sparse distribution is difficult to predict or compress, the utilization rate of a matrix operation unit of hardware is low, and particularly sparse potential is difficult to fully develop in SIMD (single instruction multiple data) architectures such as a GPU (graphics processing unit). Second, proSparse do not take full advantage of spatial locality or structural sparsity features, resulting in real storage compression and memory access optimization space being still limited. Particularly in a small batch or low parallel scene, a higher-level block sparse scheduling and encoding mechanism is lacked, and data access for zero element redundancy exists. From the above description, it can be seen that, in particular, the current sparse calculation is directed to the technical problems that the active sparse scheduling is low in efficiency, the storage and bandwidth costs are high, and how to improve the sparse scheduling efficiency and reduce the cost are needed to be solved by those skilled in the art. Disclosure of Invention In order to overcome the defects of low efficiency and high cost of the existing sparse scheduling, the invention provides a sparse identification scheduling method, device, equipment and medium. In order to achieve the above object, according to a first aspect of the present invention, an embodiment of the present invention provides a sparse identification scheduling method, including the steps of: Acquiring a sparse matrix, wherein the sparse matrix is continuously arranged in a memory by taking a combined block as a unit, the combined block comprises a plurality of main blocks, and the main blocks comprise a plurality of sub-blocks; dividing the sparse matrix into a plurality of recognition blocks by taking a preset data length as a unit, judging the recognition blocks as sparse when all elements in the recognition blocks are zero, judging the recognition blocks as valid when at least one non-zero element exists in the recognition blocks, and continuously storing a judgment result in a memory space indepe