US-20260127242-A1 - Sparse Matrix Operation Method, Apparatus, and Computing Device

US20260127242A1US 20260127242 A1US20260127242 A1US 20260127242A1US-20260127242-A1

Abstract

A method includes obtaining a first data matrix and a first index matrix that correspond to a first sparse matrix, and a second data matrix and a second index matrix that correspond to a second sparse matrix; determining, based on each row of elements in the first index matrix and each column of elements included in the second index matrix, target elements on which a dot product operation is to be performed in each row of elements in the first data matrix and each column of elements in the second data matrix; and performing a dot product operation on the target elements to obtain a first result matrix of performing a sparse multiplication operation on the first sparse matrix and the second sparse matrix.

Inventors

Yanzhe Zhang
Li Cao
Fei Yi
Gongyi Wang
Tengyi LIN
Xiao Lu

Assignees

HUAWEI TECHNOLOGIES CO., LTD.

Dates

Publication Date: 20260507
Application Date: 20251219
Priority Date: 20230628

Claims (20)

1 . A method implemented by a processor, wherein the processor comprises a sparse matrix processing core and a sparse vector operation core, and wherein the method comprises: inputting, by the sparse matrix processing core to the sparse vector operation core, a first data matrix and a first index matrix corresponding to a first sparse matrix, wherein each non-zero element in the first index matrix indicates a column number corresponding to a non-zero element at a same position in the first data matrix; inputting, by the sparse matrix processing core to the sparse vector operation core, a second data matrix and a second index matrix corresponding to a second sparse matrix, wherein each non-zero element in the second index matrix indicates a row number corresponding to a non-zero element at a same position in the second data matrix; determining, by the sparse vector operation core and based on each row of non-zero elements in the first index matrix and each column of non-zero elements in the second index matrix, target elements for a dot product operation on each row of non-zero elements in the first data matrix and each column of non-zero elements in the second data matrix; and performing, by the sparse vector operation core, the dot product operation on the target elements to obtain a first result matrix representing a sparse multiplication of the first sparse matrix and the second sparse matrix.
2 . The method of claim 1 , wherein determining the target elements comprises determining, by the sparse vector operation core, a first position in an m th row of the first index matrix and a second position in an n th column of the second index matrix, wherein the first position and the second position correspond to a same non-zero element in the m th row and the n th column, and wherein performing the dot product operation comprises performing the dot product operation on a first target element at a first position in an m th row of the first data matrix and a second target element at a second position in an n th column of the second data matrix to obtain an element in an m th row and an n th column in the first result matrix.
3 . The method of claim 1 , further comprising: obtaining, by the sparse matrix processing core, a third data matrix and a third index matrix corresponding to a third sparse matrix, wherein the third data matrix comprises a plurality of first data matrix tiles, and wherein the third index matrix comprises a first index matrix tile corresponding to each first data matrix tile; obtaining, by the sparse matrix processing core, a fourth data matrix and a fourth index matrix corresponding to a fourth sparse matrix, wherein the fourth data matrix comprises a plurality of second data matrix tiles, and wherein the fourth index matrix comprises a plurality of second index matrix tiles corresponding to each second data matrix tile; and determining, by the sparse matrix processing core, a Cartesian product of a first data matrix tile that is a non-zero matrix tile in an i th row of the third data matrix and a second data matrix tile that is a non-zero matrix tile in a j th column of the fourth data matrix.
4 . The method of claim 3 , wherein before determining the target elements, the method further comprises determining, by the sparse matrix processing core, that a first data matrix tile and a second data matrix tile that are in a group of data matrix tiles in the Cartesian product are respectively the first data matrix and the second data matrix, wherein a first index matrix tile corresponding to the first data matrix tile is the first index matrix, and wherein a second index matrix tile corresponding to the second data matrix tile is the second index matrix.
5 . The method of claim 4 , wherein the third data matrix comprises L×M first data matrix blocks, wherein each of the L×M first data matrix blocks comprises at least one first data matrix tile, wherein the third index matrix comprises L×M first index matrix blocks, wherein each of the L×M first index matrix blocks comprises at least one first index matrix tile, wherein the fourth data matrix comprises M×N second data matrix blocks, wherein each of the M×N second data matrix blocks comprises at least one second data matrix tile, wherein the fourth index matrix comprises M×N second index matrix blocks, and wherein each of the M×N second index matrix blocks comprises at least one second index matrix tile.
6 . The method of claim 5 , further comprising: determining, by the sparse matrix processing core and based on a first position of a zero matrix block of the L×M first data matrix blocks, a second position of a non-zero matrix block of the L×M first data matrix blocks, a third position of a zero matrix block of the M×N second data matrix blocks, and a fourth position of a non-zero matrix block of the M×N second data matrix blocks, a quantity of non-zero matrix blocks in L×N matrix blocks comprised in a second result matrix; and applying, by the sparse matrix processing core and based on the quantity of non-zero matrix blocks, for a storage space in a memory for the second result matrix.
7 . The method of claim 4 , wherein determining the Cartesian product comprises determining, by the sparse matrix processing core and when an n th first data matrix tile group in the i th row and an n th second data matrix tile group in the j th column satisfy an operation condition, a Cartesian product of a first data matrix tile of each non-zero matrix tile and a second data matrix tile of each non-zero matrix tile that are respectively comprised in the n th first data matrix tile group and in the n th second data matrix tile group, wherein the n th first data matrix tile group comprises a first data matrix tile that is in a same first data matrix block and that is in the i th row, wherein the n th second data matrix tile group comprises a second data matrix tile that is in a same second data matrix block and that is in the j th column, and wherein the operation condition comprises that both the n th first data matrix tile group and the n th second data matrix tile group comprise non-zero matrix tiles.
8 . A processor comprising a sparse matrix processing core and a sparse vector operation core, wherein the sparse matrix processing core is configured to: input, to the sparse vector operation core, a first data matrix and a first index matrix corresponding to a first sparse matrix, wherein each non-zero element in the first index matrix indicates a column number corresponding to a non-zero element at a same position in the first data matrix; and input, to the sparse vector operation core, a second data matrix and a second index matrix corresponding to a second sparse matrix, wherein each non-zero element in the second index matrix indicates a row number corresponding to a non-zero element at a same position in the second data matrix, and wherein the sparse vector operation core coupled to the sparse matrix processing core and configured to: determine, based on each row of non-zero elements in the first index matrix and each column of non-zero elements in the second index matrix, target elements for a dot product operation on each row of non-zero elements in the first data matrix and each column of non-zero elements in the second data matrix; and perform the dot product operation on the target elements to obtain a first result matrix representing a sparse multiplication of the first sparse matrix and the second sparse matrix.
9 . The processor of claim 8 , wherein the sparse vector operation core is further configured to: determine a first position in an m th row of the first index matrix and a second position in an n th column of the second index matrix, wherein the first position and the second position correspond to a same non-zero element comprised in the m th row and the n th column; and perform the dot product operation on a first target element at a first position in an m th row of the first data matrix and a second target element at a second position in an n th column of the second data matrix to obtain an element in an m th row and an n th column in the first result matrix.
10 . The processor of claim 8 , wherein the sparse matrix processing core is further configured to: obtain a third data matrix and a third index matrix corresponding to a third sparse matrix, wherein the third data matrix comprises a plurality of first data matrix tiles, and wherein the third index matrix comprises a first index matrix tile corresponding to each first data matrix tile; obtain a fourth data matrix and a fourth index matrix corresponding to a fourth sparse matrix, wherein the fourth data matrix comprises a plurality of second data matrix tiles, and wherein the fourth index matrix comprises a plurality of second index matrix tiles corresponding to each second data matrix tile; and determine a Cartesian product of a first data matrix tile that is a non-zero matrix tile in an i th row of the third data matrix and a second data matrix tile that is a non-zero matrix tile in a j th column of the fourth data matrix.
11 . The processor of claim 10 , wherein before determining the target elements, the sparse matrix processing core is further configured to determine that a first data matrix tile and a second data matrix tile that are in a group of data matrix tiles in the Cartesian product are respectively the first data matrix and the second data matrix, wherein a first index matrix tile corresponding to the first data matrix tile is the first index matrix, and wherein a second index matrix tile corresponding to the second data matrix tile is the second index matrix.
12 . The processor of claim 11 , wherein the third data matrix comprises L×M first data matrix blocks, wherein each of the L×M first data matrix blocks comprises at least one first data matrix tile, wherein the third index matrix comprises L×M first index matrix blocks, wherein each of the L×M first index matrix blocks comprises at least one first index matrix tile, wherein the fourth data matrix comprises M×N second data matrix blocks, wherein each of the M×N second data matrix blocks comprises at least one second data matrix tile, wherein the fourth index matrix comprises M×N second index matrix blocks, and wherein each of the M×N second index matrix blocks comprises at least one second index matrix tile.
13 . The processor of claim 12 , wherein the sparse matrix processing core is further configured to: determine, based on a first position of a zero matrix block of the L×M first data matrix blocks, a second position of a non-zero matrix block of the L×M first data matrix blocks, a third position of a zero matrix block of the M×N second data matrix blocks, and a fourth position of a non-zero matrix block of the MXN second data matrix blocks, a quantity of non-zero matrix blocks in L×N matrix blocks comprised in a second result matrix; and apply, based on the quantity of non-zero matrix blocks, for a storage space in a memory for the second result matrix.
14 . The processor of claim 11 , wherein the sparse matrix processing core is further configured to determine the Cartesian product by determining, when an n th first data matrix tile group in the i th row and an n th second data matrix tile group in the j th column satisfy an operation condition, a Cartesian product of a first data matrix tile of each non-zero matrix tile and a second data matrix tile of each non-zero matrix tile that are comprised in the n th first data matrix tile group and the n th second data matrix tile group, wherein the n th first data matrix tile group comprises a first data matrix tile that is in a same first data matrix block and that is in the i th row, wherein the n th second data matrix tile group comprises a second data matrix tile that is in a same second data matrix block and that is in the j th column, and wherein the operation condition comprises that both the n th first data matrix tile group and the n th second data matrix tile group comprise non-zero matrix tiles.
15 . A computing device, comprising: a memory configured to store instructions; and one or more processors comprising a sparse matrix processing core and a sparse vector operation core and coupled to the memory, wherein when executed by the one or more processors, the instructions cause the computing device to: input, by the sparse matrix processing core to the sparse vector operation core, a first data matrix and a first index matrix corresponding to a first sparse matrix, wherein each non-zero element in the first index matrix indicates a column number corresponding to a non-zero element at a same position in the first data matrix; input, by the sparse matrix processing core to the sparse vector operation core, a second data matrix and a second index matrix corresponding to a second sparse matrix, wherein each non-zero element in the second index matrix indicates a row number corresponding to a non-zero element at a same position in the second data matrix; determine, by the sparse vector operation core and based on each row of non-zero elements in the first index matrix and each column of non-zero elements in the second index matrix, target elements for a dot product operation on each row of non-zero elements in the first data matrix and each column of non-zero elements in the second data matrix; and perform, by the sparse vector operation core, the dot product operation on the target elements to obtain a first result matrix representing a sparse multiplication of the first sparse matrix and the second sparse matrix.
16 . The computing device of claim 15 , wherein when executed by the one or more processors, the instructions further cause computing device to: determine, by the sparse vector operation core, a first position in an m th row of the first index matrix and a second position in an n th column of the second index matrix, wherein the first position and the second position correspond to a same non-zero element in the m th row and the n th column; and perform, by the sparse vector operation core, the dot product operation on a first target element at a first position in an m th row of the first data matrix and a second target element at a second position in an n th column of the second data matrix to obtain an element in an m th row and an n th column in the first result matrix.
17 . The computing device of claim 15 , wherein when executed by the one or more processors, the instructions further cause the computing device to: obtain, by the sparse matrix processing core, a third data matrix and a third index matrix corresponding to a third sparse matrix, wherein the third data matrix comprises a plurality of first data matrix tiles, wherein the third index matrix comprises a first index matrix tile corresponding to each first data matrix tile; obtain, by the sparse matrix processing core, a fourth data matrix and a fourth index matrix that correspond to a fourth sparse matrix, wherein the fourth data matrix comprises a plurality of second data matrix tiles, and wherein the fourth index matrix comprises a plurality of second index matrix tiles corresponding to each second data matrix tile; determine, by the sparse matrix processing core, a Cartesian product of a first data matrix tile that is a non-zero matrix tile in an i th row of the third data matrix and a second data matrix tile that is a non-zero matrix tile in a j th column of the fourth data matrix; and determine, by the sparse matrix processing core and before determining the target elements, that a first data matrix tile and a second data matrix tile that are comprised in a group of data matrix tiles in the Cartesian product are respectively the first data matrix and the second data matrix, a first index matrix tile corresponding to the first data matrix tile is the first index matrix, and a second index matrix tile corresponding to the second data matrix tile is the second index matrix.
18 . The computing device of claim 17 , wherein the third data matrix comprises L×M first data matrix blocks, wherein each of the L×M first data matrix blocks comprises at least one first data matrix tile, wherein the third index matrix comprises L×M first index matrix blocks, wherein each of the L×M first index matrix blocks comprises at least one first index matrix tile, wherein the fourth data matrix comprises M×N second data matrix blocks, wherein each of the M×N second data matrix blocks comprises at least one second data matrix tile, wherein the fourth index matrix comprises M×N second index matrix blocks, and wherein each of the M×N second index matrix blocks comprises at least one second index matrix tile.
19 . The computing device of claim 18 , wherein when executed by the one or more processors, the instructions further cause the computing device to: determine, by the sparse matrix processing core and based on a first position of a zero matrix block of the L×M first data matrix blocks, a second position of a non-zero matrix block of the L×M first data matrix blocks, a third position of a zero matrix block of the M×N second data matrix blocks, and a fourth position of a non-zero matrix block of the M×N second data matrix blocks, a quantity of non-zero matrix blocks in L×N matrix blocks comprised in a second result matrix; and apply, based on the quantity of non-zero matrix blocks, for a storage space in the memory for the second result matrix.
20 . The computing device of claim 17 , wherein when executed by the one or more processors the instructions cause the computing device to determine the Cartesian product by determining, obtain, by the sparse matrix processing core and when an n th first data matrix tile group in the i th row and an n th second data matrix tile group in the j th column satisfy an operation condition, a Cartesian product of a first data matrix tile of each non-zero matrix tile and a second data matrix tile of each non-zero matrix tile that are comprised in the n th first data matrix tile group and the n th second data matrix tile group, wherein the n th first data matrix tile group comprises a first data matrix tile that is in a same first data matrix block and that is in the i th row, wherein the n th second data matrix tile group comprises a second data matrix tile that is in a same second data matrix block and that is in the j th column, and wherein the operation condition comprises that both the n th first data matrix tile group and the n th second data matrix tile group comprise non-zero matrix tiles.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This is a continuation of International Patent Application No. PCT/CN2024/101040 filed on Jun. 24, 2024, which claims priority to Chinese Patent Application No. 202310781382.5 filed on Jun. 28, 2023 and Chinese Patent Application No. 202311240428.9 filed on Sep. 22, 2023 all of which are hereby incorporated by reference. TECHNICAL FIELD This disclosure relates to the field of computing technologies, and in particular, to a sparse matrix operation method, a processor, and a computing device. BACKGROUND Matrix multiplication is widely applied to various scientific computing scenarios. Matrix density varies in different computing scenarios, and matrix multiplication is mainly classified into dense matrix multiplication and sparse matrix multiplication. Usually, dense matrix multiplication is mainly applied to scenarios such as machine learning and neural network model computation, and sparse matrix multiplication is mainly applied to scenarios such as graph computation, graph neural networks, and multigrid solving. Because a sparse matrix includes a large quantity of zero elements, in a process of performing sparse matrix multiplication, the large quantity of zero elements participate in a matrix multiplication operation. Consequently, matrix operation efficiency is low. SUMMARY Embodiments of this disclosure provide a sparse matrix operation method, a processor, and a computing device, to improve execution efficiency of sparse matrix multiplication. Corresponding technical solutions are as follows. According to a first aspect, a sparse matrix operation method is provided. The method is performed by a processor including a sparse matrix processing unit and a sparse vector operation unit. The method includes the following. The sparse matrix processing unit inputs, to the sparse vector operation unit, a first data matrix and a first index matrix that correspond to a first sparse matrix and that are obtained through processing by the sparse matrix processing unit, and a second data matrix and a second index matrix that correspond to a second sparse matrix and that are obtained through processing by the sparse matrix processing unit, where each non-zero element in the first index matrix indicates a column number corresponding to a non-zero element at a same position in the first data matrix, and each non-zero element in the second index matrix indicates a row number corresponding to a non-zero element at a same position in the second data matrix. The sparse vector operation unit determines, based on each row of non-zero elements in the first index matrix and each column of non-zero elements included in the second index matrix, target elements on which a dot product operation needs to be performed in each row of non-zero elements in the first data matrix and each column of non-zero elements in the second data matrix, and performs a dot product operation on the target elements, to obtain a first result matrix of performing a sparse multiplication operation on the first sparse matrix and the second sparse matrix. The first data matrix is obtained by compressing non-zero elements in the first sparse matrix in a row direction, and the second data matrix is obtained by compressing non-zero elements in the second sparse matrix in a column direction. In the solution provided in this disclosure, the sparse matrix processing unit may determine, based on row numbers and column numbers separately indicated by each row of non-zero elements in the first index matrix and each column of non-zero elements in the second index matrix, the target elements on which a dot product operation needs to be performed in the first data matrix and the second data matrix, and perform the dot product operation on the target elements, to obtain the first result matrix of performing the sparse multiplication operation on the first sparse matrix and the second sparse matrix. It can be learned that, in the solution shown in this disclosure, an operation on a large quantity of zero elements in a sparse multiplication operation process can be avoided, and efficiency of performing the sparse multiplication operation by the processor can be improved. In an implementation, that the sparse vector operation unit determines, based on each row of elements included in the first index matrix and each column of elements included in the second index matrix, the target elements on which a dot product operation needs to be performed in each row of elements in the first data matrix and each column of elements in the second data matrix, and performs the dot product operation on the target elements, to obtain the first result matrix of performing the sparse multiplication operation on the first sparse matrix and the second sparse matrix includes the following. The sparse vector operation unit determines a first position in an mth row of the first index matrix and a second position in an nth column of the second index matrix tha