US-12626136-B2 - Matrix inversion using analog resistive crossbar array hardware

US12626136B2US 12626136 B2US12626136 B2US 12626136B2US-12626136-B2

Abstract

Matrix inversion systems and methods are implemented using an analog resistive processing unit (RPU) array for hardware accelerated computing. A request is received from an application to compute an inverse matrix of a given matrix, and a matrix inversion process is performed in response to the received request. The matrix inversion process includes storing a first estimated inverse matrix of the given matrix in an array RPU cells, performing a first iterative process on the first estimated inverse matrix stored in the array of RPU cells to converge the first estimated inverse matrix to a second estimated inverse matrix of the given matrix, and reading the second estimated inverse matrix from the array of RPU cells upon completion of the first iterative process. An inverse matrix is returned to the application, wherein the returned inverse matrix is based, at least in part, on the second estimated inverse matrix.

Inventors

Tayfun Gokmen
Vanessa Lopez-Marrero
Oguzhan Murat Onen
Chai Wah Wu
Mark S. Squillante
MALTE JOHANNES RASCH
Tomasz J. Nowicki
Wilfried Haensch
Lior Horesh
Vasileios Kalantzis

Assignees

INTERNATIONAL BUSINESS MACHINES CORPORATION

Dates

Publication Date: 20260512
Application Date: 20201228

Claims (20)

1 . A method comprising: receiving, by a digital processing system, a request from an application to compute an inverse matrix of a given matrix; utilizing, by the digital processing system, an analog resistive processing unit (RPU) system to perform a matrix inversion process in response to the received request, the analog RPU system comprising a crossbar array of RPU cells, each RPU cell comprising a respective resistive memory device with a conductance that is programmable, wherein utilizing the analog RPU system to perform the matrix inversion process comprises: storing an estimated inverse matrix of the given matrix in the crossbar array of RPU cells of the analog RPU system by programming the conductances of the resistive memory devices of at least portion of the RPU cells to store matrix values of the estimated matrix in the crossbar array of RPU cells; performing a first iterative process on the estimated inverse matrix stored in the crossbar array of RPU cells to converge the estimated inverse matrix to an approximate inverse matrix of the given matrix, wherein the first iterative process comprises the digital processing system iteratively inputting row vectors to the crossbar array of RPU cells, which represent rows of the given matrix, to perform analog matrix-vector multiplication operations using the analog RPU system to multiply the input row vectors with the estimated inverse matrix stored in the crossbar array of RPU cells, wherein for each iteration, the digital processing system utilizes an output vector resulting from the analog matrix-vector multiplication operation to determine an error vector, and apply the error vector to the crossbar array of RPU cells to change the conductances of at least some of the resistive memory devices of the crossbar array of cells RPU to update at least a portion of the matrix values of the estimated inverse matrix stored in the crossbar array of RPU cells; and reading the approximate inverse matrix from the crossbar array of RPU cells upon completion of the first iterative process; and returning, by the digital processing system, an inverse matrix to the application, wherein the returned inverse matrix is based, at least in part, on the approximate inverse matrix.
2 . The method of claim 1 , wherein the first iterative process comprises a stochastic gradient descent optimization process which comprises utilizing the row vectors of the given matrix as training data to train the estimated inverse matrix stored in the crossbar array of RPU cells and update matrix values of the estimated inverse matrix stored in the crossbar array of RPU cells by utilizing the error vectors that are determined based on matrix values of an identity matrix.
3 . The method of claim 1 , wherein performing the first iterative process comprises performing an iteration of the first iterative process, wherein performing the iteration comprises: performing a vector matrix operation on the crossbar array of RPU cells, wherein performing the vector matrix operation comprises inputting a given row vector, which represents a given row of the given matrix, to the crossbar array of RPU cells, and multiplying the given row vector with the estimated inverse matrix stored in the crossbar array of RPU cells to generate an output vector which is output from the crossbar array of RPU cells; and performing an update operation to update matrix values of the estimated inverse matrix stored in the crossbar array of RPU cells, wherein performing the update operation comprises determining an error vector based on a difference between the output vector and a row vector of an identity matrix, inputting both the row vector of the given matrix and the determined error vector to the crossbar array of RPU cells, and performing an outer product operation of the row vector and the determined error vector on the crossbar array of RPU cells to update at least some matrix values of the estimated inverse matrix that is stored in the crossbar array of RPU cells.
4 . The method of claim 3 , wherein performing the update operation further comprises: scaling values of the row vector according to a learning rate parameter to generate a scaled row vector; and performing the outer product operation of the scaled row vector and the determined error vector on the crossbar array of RPU cells to update at least some matrix values of the estimated inverse matrix that is stored in the crossbar array of RPU cells.
5 . The method of claim 3 , further comprising repeating the iteration of the first iterative process for each row vector of the given matrix to complete an epoch of the first iterative process.
6 . The method of claim 3 , further comprising performing the first iterative process for a prespecified number of epochs, wherein the first iterative process is deemed complete following completion of the prespecified number of epochs.
7 . The method of claim 1 , wherein the given matrix comprises a symmetric positive definite matrix.
8 . The method of claim 1 , wherein storing the estimated inverse matrix of the given matrix in the crossbar array of RPU cells comprises: computing a random matrix; and storing the random matrix in the crossbar array of RPU cells as an initial estimate of the inverse matrix of the given matrix.
9 . The method of claim 1 , further comprising performing a second iterative process on the approximate inverse matrix read out from the crossbar array of RPU cells to thereby generate a final approximate inverse matrix that has an accuracy which is greater than an accuracy of the approximate inverse matrix, wherein the second iterative process comprises an optimization process that is performed in a digital domain, and wherein the final approximate inverse matrix comprises the inverse matrix which is returned to the application.
10 . The method of claim 9 , wherein the second iterative process comprise a Newton optimization process.
11 . A computer program product for performing a matrix inversion process, the computer program product comprising: one or more non-transitory computer readable storage media, and program instructions collectively stored on the one or more non-transitory computer readable storage media, the program instructions comprising: program instructions to receive a request from an application to compute an inverse matrix of a given matrix; program instructions to utilize an analog resistive processing unit (RPU) system to perform a matrix inversion process in response to the received request, the analog RPU system comprising a crossbar array of RPU cells, each RPU cell comprising a respective resistive memory device with a conductance that is programmable, wherein the program instructions to utilize the analog RPU system to perform the matrix inversion process comprise: program instructions to store an estimated inverse matrix of the given matrix in the crossbar array of RPU cells of the analog RPU system by programming the conductances of the resistive memory devices of at least portion of the RPU cells to store matrix values of the estimated matrix in the crossbar array of RPU cells; program instructions to perform a first iterative process on the estimated inverse matrix stored in the crossbar array of RPU cells to converge the estimated inverse matrix to an approximate inverse matrix of the given matrix, wherein the first iterative process comprises iteratively inputting row vectors to the crossbar array of RPU cells, which represent rows of the given matrix, to perform analog matrix-vector multiplication operations using the analog RPU system to multiply the input row vectors with the estimated inverse matrix stored in the crossbar array of RPU cells, wherein for each iteration, an output vector resulting from the analog matrix-vector multiplication operation is utilized to determine an error vector which is applied to the crossbar array of RPU cells to change the conductances of at least some of the resistive memory devices of the crossbar array of cells RPU to update at least a portion of the matrix values of the estimated inverse matrix stored in the crossbar array of RPU cells; and program instruction to read the approximate inverse matrix from the crossbar array of RPU cells upon completion of the first iterative process; and program instructions to return an inverse matrix to the application, wherein the returned inverse matrix is based, at least in part, on the approximate inverse matrix.
12 . The computer program product of claim 11 , wherein the first iterative process comprises a stochastic gradient descent optimization process which comprises utilizing the row vectors of the given matrix as training data to train the estimated inverse matrix stored in the crossbar array of RPU cells and update matrix values of the estimated inverse matrix stored in the crossbar array of RPU cells by utilizing the error vectors that are determined based on matrix values of an identity matrix.
13 . The computer program product of claim 11 , wherein the program instructions to perform the first iterative process comprise program instructions to perform an iteration of the first iterative process, wherein the program instructions to perform the iteration comprise: program instructions to perform a vector matrix operation on the crossbar array of RPU cells, wherein performing the vector matrix operation comprises inputting a given row vector, which represents a given row of the given matrix, to the crossbar array of RPU cells, and multiplying the given row vector with the estimated inverse matrix stored in the crossbar array of RPU cells to generate an output vector which is output from the crossbar array of RPU cells; and program instructions to perform an update operation to update matrix values of the estimated inverse matrix stored in the crossbar array of RPU cells, wherein performing the update operation comprises determining an error vector based on a difference between the output vector and a row vector of an identity matrix, inputting both the row vector of the given matrix and the determined error vector to the crossbar array of RPU cells, and performing an outer product operation of the row vector and the determined error vector on the crossbar array of RPU cells to update at least some matrix values of the estimated inverse matrix that is stored in the crossbar array of RPU cells.
14 . The computer program product of claim 13 , wherein the program instructions to perform the update operation further comprise: program instructions to scale values of the row vector according to a learning rate parameter to generate a scaled row vector; and program instructions to perform the outer product operation of the scaled row vector and the determined error vector on the crossbar array of RPU cells to update at least some matrix values of the estimated inverse matrix that is stored in the crossbar array of RPU cells.
15 . The computer program product of claim 13 , further comprising program instructions to repeat the iteration of the first iterative process for each row vector of the given matrix to complete an epoch of the first iterative process, and perform the first iterative process for a prespecified number of epochs, wherein the first iterative process is deemed complete following completion of the prespecified number of epochs.
16 . The computer program product of claim 11 , further comprising program instructions to perform a second iterative process on the approximate inverse matrix read out from the crossbar array of RPU cells to thereby generate a final approximate inverse matrix that has an accuracy which is greater than an accuracy of the approximate inverse matrix, wherein the second iterative process comprises an optimization process that is performed in a digital domain, and wherein the final approximate inverse matrix comprises the inverse matrix which is returned to the application.
17 . The computer program product of claim 16 , wherein the second iterative process comprise a Newton optimization process.
18 . A method comprising: storing an estimated inverse matrix of a given matrix in a crossbar array of resistive processing unit (RPU) cells, wherein the RPU cells comprise respective resistive memory devices with conductances that are programmed to encode an array of matrix values of the estimated inverse matrix that is stored in the array of RPU cells; performing a first operation on the crossbar array of RPU cells, wherein the first operation comprises: applying a row vector, which represents a row of the given matrix, to inputs of first control lines that extend in a first direction across the crossbar array of RPU cells; and performing a vector matrix multiplication operation on the crossbar array of RPU cells to generate an output vector on outputs of second control lines that extend in a second direction across the crossbar array of RPU cells, wherein the output vector represents a multiplication of the row vector and the array of matrix values of the estimated inverse matrix that is stored in the crossbar array of RPU cells; and performing a second operation on the array of RPU cells, wherein the second operation comprises: applying a scaled row vector to the inputs of the first control lines, wherein the scaled row vector comprises the row vector scaled by a learning rate parameter; applying an error vector to inputs of the second control lines; and performing an outer product operation on the crossbar array of RPU cells to change the conductances of at least some of the resistive memory devices of the crossbar array of cells RPU to update at least some of the matrix values of the estimated inverse matrix that is stored in the crossbar array of RPU cells, wherein the outer product operation comprises performing an outer product of the scaled row vector and the error vector.
19 . The method of claim 18 , wherein the error vector represents a difference between the output vector generated by the first operation and a corresponding row vector of an identity matrix associated with the given matrix.
20 . The method of claim 18 , wherein the first and second operations are performed as part of an iterative stochastic gradient descent process which comprises utilizing the matrix values in row vectors of the given matrix as training data to train the estimated inverse matrix stored in the crossbar array of RPU cells and, thereby, converge the estimated inverse matrix to an increasingly accurate estimation of an actual inverse matrix of the given matrix.

Description

BACKGROUND This disclosure relates generally to analog resistive processing systems for neuromorphic computing, and techniques for performing hardware accelerated numerical computing tasks using an analog resistive processing system. Information processing systems such as Neuromorphic computing systems and artificial neural network (ANN) systems are utilized in various applications such as machine learning and inference processing for cognitive recognition and computing. Such systems are hardware-based systems that generally include a large number of highly interconnected processing elements (referred to as “artificial neurons”) that operate in parallel to perform various types of computations. The artificial neurons (e.g., pre-synaptic neurons and post-synaptic neurons) are connected using artificial synaptic devices which provide synaptic weights that represent connection strengths between the artificial neurons. The synaptic weights can be implemented using an array of RPU cells having tunable resistive memory devices, the conductance states of the RPU cells are encoded or otherwise mapped to the synaptic weights. SUMMARY Embodiments of the disclosure include computing systems, devices, and methods for performing a matrix inversion process using an analog resistive processing unit array for hardware accelerated computing. For example, an exemplary embodiment includes a method which comprises receiving a request from an application to compute an inverse matrix of a given matrix and performing a matrix inversion process in response to the received request. The matrix inversion process comprises storing a first estimated inverse matrix of the given matrix in an array of resistive processing unit (RPU) cells, performing a first iterative process on the first estimated inverse matrix stored in the array of RPU cells to converge the first estimated inverse matrix to a second estimated inverse matrix of the given matrix, and reading the second estimated inverse matrix from the array of RPU cells upon completion of the first iterative process. An inverse matrix is returned to the application, wherein the returned inverse matrix is based, at least in part, on the second estimated inverse matrix. Another exemplary embodiment includes a device which comprises an array of resistive processing unit (RPU) cells, first control lines extending in a first direction across the array of RPU cells, and second control lines extending in a second direction across the array of RPU cells, and control circuitry. Each RPU cell is connected at an intersection of one of the first control lines and one of the second control lines. Each RPU cell comprises a resistive device with a tunable conductance, wherein conductance values of a least a portion of the resistive devices of the RPU cells in the array of RPU cells encode matrix values of an estimated inverse matrix that is stored in the array of RPU cells. The estimated inverse matrix stored in the array of RPU cells represents an estimate of an inverse matrix of a given matrix. The control circuitry is operatively coupled to the array of RPU cells to cause performance of a first operation and a second operation on the array of RPU cells. The first operation comprises the control circuitry applying a row vector, which represents a row of the given matrix, to inputs of the first control lines to perform a vector matrix operation which comprises multiplying the input row vector with the estimated inverse matrix stored in the array of RPU cells to generate an output vector on outputs of the second control lines. The second operation comprises the control circuitry applying an error vector to inputs of the second control lines and inputting the row vector to the inputs of the first control lines, to perform an outer product operation of the error vector and the row vector on the array of RPU calls to thereby update at least some matrix values of the estimated inverse matrix that is stored in the array of RPU cells. Another exemplary embodiment includes a computing system which comprises a digital processing system and a neuromorphic computing system coupled to the digital processing system. The digital processing system comprises one or more processors and memory to store program instructions that are executed by the one or more processors to perform a matrix inversion process to compute an inverse matrix of a given matrix. The neuromorphic computing system comprises at least one neural core. The at least one neural core comprises an array of resistive processing unit (RPU) cells, first control lines extending in a first direction across the array of RPU cells, second control lines extending in a second direction across the array of RPU cells, and peripheral circuitry coupled to the first control lines and to the second control lines. Each RPU cell is connected at an intersection of one of the first control lines and one of the second control lines, and each RPU cell comprises a resisti