US-12619871-B2 - Interpretable neural network architecture using continued fractions

US12619871B2US 12619871 B2US12619871 B2US 12619871B2US-12619871-B2

Abstract

A method, a neural network, and a computer program product are provided that provide training of neural networks with continued fractions architectures. The method includes receiving, as input to a neural network, input data and training the input data through a plurality of continued fractions layers of the neural network to generate output data. The input data is provided to each of the continued fractions layers as well as output data from a previous layer. The method further includes outputting, from the neural network, the output data. Each continued fractions layer of the continued fractions layers is configured to calculate one or more linear functions of its respective input and to generate an output that is used as the input for a subsequent continued fractions layer, each continued fractions layer configured to generate an output that is used as the input for a subsequent layer.

Inventors

Isha Puri
Amit Dhurandhar
Tejaswini Pedapati
Karthikeyan SHANMUGAM
Dennis Wei
Kush Raj Varshney

Assignees

INTERNATIONAL BUSINESS MACHINES CORPORATION

Dates

Publication Date: 20260505
Application Date: 20220609

Claims (13)

1 . A computer-implemented method of implementing an architecture of a neural network, the method further comprising: receiving, via an input layer of the neural network, input data x having dimensionality p, wherein x j represents a dimension of the input data x for j=1, 2, . . . , p; training the neural network via passing the input data x from the input layer through a plurality of continued fractions layers of the neural network to generate output data, wherein each continued fractions layer is configured, via learning optimal weights associated with its respective neurons, to calculate one or more linear functions of its respective input and generate, via nonlinear activation functions of its respective neurons, an output that is used as input for a subsequent continued fractions layer, wherein the output of each neuron of each continued fractions layer is computed using reciprocal function z−>1/z as the nonlinear activation function, wherein the architecture of the neural network is a linear combination of a plurality of ladders, each ladder comprising a plurality of continued fractions layers, and wherein the architecture of the neural network is one of a set of variants, the set of variants including at least one of: a full-fledged variant wherein each ladder receives as input all dimensions of the input data x, a diagonalized variant wherein each ladder receives as input only one dimension x j of the input data x, and a combined variant that is a combination of the full-fledged variant and the diagonalized variant, wherein at least one of its ladders receives as input all dimensions of the input data x, and at least one of its ladders receives as input only one dimension x j of the input data x; and outputting, from the neural network, the output data.
2 . The computer-implemented method of claim 1 , wherein each continued fractions layer is configured to calculate at least one linear function of a continued fraction in canonical form.
3 . The computer-implemented method of claim 1 , wherein the neural network is interpretable using continuants to compute a gradient for each continued fractions layer with respect to its inputs, wherein the continuants are multivariate polynomials representing a determinant of a tridiagonal matrix and the computed gradients are utilized in providing first-order attributions.
4 . The computer-implemented method of claim 1 , wherein the neural network is interpretable using an interpretation power series that represents each continued fractions layer as a multivariate power series.
5 . The computer implemented method of claim 1 , wherein all variants in the set of variants are differentiable and are trained using Alternating Direction Method of Multipliers (ADMM) and back propagation, and wherein one or more dropout techniques are utilized for improved generalization.
6 . The computer implemented method of claim 1 , further comprising: altering the nonlinear activation function to handle poles, wherein the nonlinear activation function is altered to sgn ⁡ ( z ) ⁢ 1 max ⁡ ( ❘ "\[LeftBracketingBar]" z ❘ "\[RightBracketingBar]" , ϵ ) for some ∈>0, where |·| denotes absolute value, and wherein the E is fixed to a small positive value or tuned during the training.
7 . The computer implemented method of claim 1 , wherein the neural network is interpreted using a multivariate power series to obtain higher-order terms by summing coefficients for each monomial term of the multivariate power series, wherein coefficient sums provide attributions for individual dimensions x j and higher-order interactions up to a depth of the plurality of ladders.
8 . The computer implemented method of claim 7 , further comprising: determining appropriate coefficients based on a linear recurrence relation or by using one or more symbolic manipulation tools.
9 . A computer program product comprising a computer readable storage medium having computer readable instructions stored therein, wherein the computer readable instructions, when executed on a computing device, causes the computing device to perform operations to implement an architecture of a neural network, the operations comprising: receiving, via an input layer of the neural network, input data x having dimensionality p, wherein x j represents a dimension of the input data x for j=1, 2, . . . , p; training the neural network via passing the input data x from the input layer through a plurality of continued fractions layers of the neural network to generate output data, wherein each continued fractions layer is configured, via learning optimal weights associated with its respective neurons, to calculate one or more linear functions of its respective input and generate, via nonlinear activation functions of its respective neurons, an output that is used as input for a subsequent continued fractions layer, wherein the output of each neuron of each continued fractions layer is computed using reciprocal function z−>1/z as the nonlinear activation function, wherein the architecture of the neural network is interpretable as a linear combination of a plurality of ladders, each ladder comprising a plurality of continued fractions layers, and wherein the architecture of the neural network is one of a set of variants, the set of variants including at least one of: a full-fledged variant wherein each ladder receives as input all dimensions of the input data x, a diagonalized variant wherein each ladder receives as input only one dimension x j of the input data x, and a combined variant that is a combination of the full-fledged variant and the diagonalized variant, wherein at least one of its ladders receives as input all dimensions of the input data x, and at least one of its ladders receives as input only one dimension x j of the input data x; and outputting, from the neural network, the output data.
10 . The computer program product of claim 9 , wherein each continued fractions layer is configured to calculate at least one linear function of a continued fraction in canonical form.
11 . The computer program product of claim 9 , wherein the neural network is interpretable using continuants to compute a gradient for each continued fractions layer with respect to its inputs, wherein the continuants are multivariate polynomials representing a determinant of a tridiagonal matrix and the computed gradients are utilized in providing first-order attributions.
12 . The computer program product of claim 9 , wherein the neural network is interpretable using an interpretation power series that represents each continued fractions layer as a multivariate power series.
13 . A computer system comprising: a processor set; one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media to cause the processor set to perform operations to implement an architecture of a neural network, the operations comprising: receiving, via an input layer of the neural network, input data x having dimensionality p, wherein x j represents a dimension of the input data x for j=1, 2, . . . , p; training the neural network via passing the input data x from the input layer through a plurality of continued fractions layers of the neural network to generate output data, wherein each continued fractions layer is configured, via learning optimal weights associated with its respective neurons, to calculate one or more linear functions of its respective input and generate, via nonlinear activation functions of its respective neurons, an output that is used as input for a subsequent continued fractions layer, wherein the output of each neuron of each continued fractions layer is computed using reciprocal function z−>1/z as the nonlinear activation function, wherein the architecture of the neural network is interpretable as a linear combination of a plurality of ladders, each ladder comprising a plurality of continued fractions layers, and wherein the architecture of the neural network is one of a set of variants, the set of variants including at least one of: a full-fledged variant wherein each ladder receives as input all dimensions of the input data x, a diagonalized variant wherein each ladder receives as input only one dimension x j of the input data x, and a combined variant that is a combination of the full-fledged variant and the diagonalized variant, wherein at least one of its ladders receives as input all dimensions of the input data x, and at least one of its ladders receives as input only one dimension x j of the input data x; and outputting, from the neural network, the output data.

Description

STATEMENT REGARDING PRIOR DISCLOSURES BY THE INVENTOR OR A JOINT INVENTOR (if applicable) The following disclosure(s) are submitted under 35 U.S.C. 102(b)(1)(A): DISCLOSURE(S): Title: CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions, Authors: Isha Puri, Amit Dhurandhar, Tejaswini Pedapati, Karthikeyan Shanmugam, Dennis Wei, Kush R. Varshney, Publisher: Conference on Neural Information Processing Systems (NeurIPS 2021), Date: Dec. 7, 2021. BACKGROUND The present disclosure relates to neural networks, and more specifically, to an interpretable neural network using continued fractions. An artificial neural network (or simply neural network) consists of an input layer of neurons (or nodes, units), one or two (or even three) hidden layers of neurons, and a final layer of output neurons. In a typical neural network architecture, lines connect to neurons, where each connection is associated with a numeric number called weight. The output of each neuron is computed using an activation (or transfer) function. The purpose of the activation function is, besides introducing nonlinearity into the neural network, to bound the value of the neuron so that divergent neurons do not paralyze the neural network. In mathematics, a continued fraction is an expression obtained through an iterative process of representing a number as the sum of its integer part and the reciprocal of another number, then writing this other number as the sum of its integer part and another reciprocal, and so on. In a finite continued fraction (or terminated continued fraction), the iteration/recursion is terminated after finitely many steps by using an integer in lieu of another continued fraction. In contrast, an infinite continued fraction is an infinite expression. In either case, all integers in the sequence, other than the first, must be positive. SUMMARY Embodiments of the present disclosure include a method that provides training for neural networks with continued fractions architectures. The method includes receiving, as input to a neural network, input data and training the input data through a plurality of continued fractions layers of the neural network to generate output data. The input data is provided to each of the continued fractions layers, as well as output data from a previous layer. The method further includes outputting from the neural network the output data. Each continued fractions layer of the continued fractions layers is configured to calculate one or more linear functions of its respective input and to generate an output that is used as the input for a subsequent continued fractions layer. Each continued fractions layer is configured to generate an output that is used as the input for a subsequent layer. Additional embodiments of the present disclosure include a computer program product method that provides training of neural networks with continued fractions architectures, one or more computer-readable storage medium, and program instructions stored on the one or more computer-readable storage media, the program instruction executable by a processor to cause the processor to receive, as input to a neural network, input data and train the input data through a plurality of continued fractions layers of the neural network to generate output data. The input data is provided to each of the continued fractions layers, as well as output data from a previous layer. The computer program product further includes instructions to output, from the neural network, the output data. Each continued fractions layer of the continued fractions layers is configured to calculate one or more linear functions of its respective input and to generate an output that is used as the input for a subsequent continued fractions layer, each continued fractions layer is configured to generate an output that is used as the input for a subsequent layer. Further embodiments of the present disclosure include a neural network with continued fractions inspired architecture. The continued fractions architecture can encompass a full-fledged variant where each layer receives the entire input layer at every stage. Another continued fractions architecture can encompass a diagonalized variant where each layer only receives one of the input dimensions of the input layer, making it an additive model. Another continued fractions architecture encompasses a combination of the diagonalized variant and the full variant. The full layers are of increasing depth and can be understood the capture the respective order of interactions. The neural network can be implemented within a system that includes a memory, a processor, and local data storage having stored thereon computer-executable code. The computer-executable code includes the program instruction executable by a processor to cause the processor to perform the method described above. The present summary is not intended to illustrate each aspect of, every implementation of, and/or every embodime