US-12626091-B2 - Configuration process framework for machine learning models

US12626091B2US 12626091 B2US12626091 B2US 12626091B2US-12626091-B2

Abstract

Various embodiments are directed to configuring or training deep neural network (DNN) machine learning models comprising one or more hidden layers and an output layer. Various embodiments provide technical advantages in training DNN machine learning models, including improved computational efficiency and guaranteed optimality. In one embodiment, an example method includes identifying a nonlinear-model-based representation for each hidden layer, which may be a Bank of Wiener Models, a nonlinear units of the hidden layer, and/or the like. The method further includes individually and sequentially configuring the hidden layers, each configured by determining a correlation measure (e.g., a correlation ratio) between the layer output and a target signal. Parameters of the particular hidden layer are modified by maximizing the correlation measure to yield maximal correlation over the space of functions. The method further includes performing automated tasks using the DNN machine learning model after configuring its parameters on a training set.

Inventors

Jose C. Principe
Bo Hu

Assignees

UNIVERSITY OF FLORIDA RESEARCH FOUNDATION, INCORPORATED

Dates

Publication Date: 20260512
Application Date: 20221103

Claims (20)

1 . A computer-implemented method for configuring a deep neural network (DNN) machine learning model comprising one or more hidden layers and an output layer, the method comprising: receiving, using one or more processors, one or more input signals and one or more target signals, wherein each of the one or more target signals corresponds to an input signal; selecting, using the one or more processors and based at least in part on the one or input signals and one or more automated tasks, a nonlinear-model-based representation for each hidden layer of the DNN machine learning model, wherein the nonlinear-model-based representation comprises one or more parameters configured to be modified to configure a corresponding hidden layer; sequentially configuring, using the one or more processors, at least a selected subset of the one or more hidden layers of the DNN machine learning model, wherein the selected subset comprises a particular hidden layer, and wherein the particular hidden layer is configured independently from, and before configuring subsequent hidden layers of the selected subset, by: determining a correlation measure between (i) an output of the particular hidden layer in response to a given input signal, and (ii) a given target signal corresponding to the given input signal, modifying the one or more parameters of the nonlinear-model-based representation for the particular hidden layer based at least in part on maximizing the correlation measure, and fixing the one or more modified parameters of the nonlinear-model-based representation for the particular hidden layer such that the one or more modified parameters cannot be modified by configuring a subsequent hidden layer; and initiating performance of the one or more automated tasks using the DNN machine learning model.
2 . The method of claim 1 , wherein sequentially configuring the one or more hidden layers comprises fixing the modified parameters of the nonlinear-model-based representation for the particular hidden layer before modifying parameters of a nonlinear-model-based representation for a subsequent hidden layer.
3 . The method of claim 1 , wherein the output layer is a final projection layer that is configured subsequent to the sequential configuration of at least the selected subset of the one or more hidden layers of the DNN machine learning model, the output layer being configured based at least in part on a least square projection.
4 . The method of claim 1 , wherein the output layer is a final projection layer that is configured subsequent to the sequential configuration of at least the selected subset of the one or more hidden layers of the DNN machine learning model, the output layer being configured based at least in part on maximizing the correlation measure with the target signal.
5 . The method of claim 1 , wherein the correlation measure is a correlation ratio between (i) the layer output of the particular hidden layer in response to the given input signal, and (ii) the given target signal corresponding to the given input signal.
6 . The method of claim 1 , wherein a nonlinear-model-based representation for a hidden layer comprises a plurality of block-oriented nonlinear models.
7 . The method of claim 6 , wherein at least one of the plurality of block-oriented nonlinear models is a Hammerstein-Wiener model.
8 . The method of claim 1 , wherein at least one hidden layer of the DNN machine learning model is substituted by a nonlinear mapping of the one or more input signals to a reproducing kernel Hilbert space (RKHS) where a linear weighting of a plurality of projections is configured by maximizing the correlation measure with the target signal.
9 . The method of claim 1 , wherein the final projection layer is further configured using a combination of one or more outputs from the one or more hidden layers.
10 . The method of claim 1 , wherein the layer output of the particular hidden layer in response to the given input signal is determined directly from the output of a preceding hidden layer.
11 . The method of claim 1 , wherein the layer output of the particular hidden layer in response to the given input signal is determined based at least in part on a combination of one or more outputs of one or more preceding layers.
12 . An apparatus for configuring a deep neural network (DNN) machine learning model comprising one or more hidden layers and an output layer, and the apparatus comprising at least one processor and at least one memory including program code, the at least one memory and the program code configured to, with the processor, cause the apparatus to at least: receive one or more input signals and one or more target signals, wherein each of the one or more target signals corresponds to an input signal; select, based at least in part on the one or input signals and one or more automated tasks, a nonlinear-model-based representation for each hidden layer of the DNN machine learning model, wherein the nonlinear-model-based representation comprises one or more parameters configured to be modified to configure a corresponding hidden layer; sequentially configure at least a selected subset of the one or more hidden layers of the DNN machine learning model, wherein the selected subset comprises a particular hidden layer, and wherein the particular hidden layer is configured independently from, and before configuring subsequent hidden layers of the selected subset, by: determining a correlation measure between (i) an output of the particular hidden layer in response to a given input signal, and (ii) a given target signal corresponding to the given input signal, modifying the one or more parameters of the nonlinear-model-based representation for the particular hidden layer based at least in part on maximizing the correlation measure, and fixing the one or more modified parameters of the nonlinear-model-based representation for the particular hidden layer such that the one or more modified parameters cannot be modified by configuring a subsequent hidden layer; and initiate performance of the one or more automated tasks using the DNN machine learning model.
13 . The apparatus of claim 12 , wherein the apparatus sequentially configures the one or more hidden layers by at least fixing the modified parameters of the nonlinear-model-based representation for the particular hidden layer before modifying parameters of a nonlinear-model-based representation for a subsequent hidden layer.
14 . The apparatus of claim 12 , wherein the output layer is a final projection layer that is configured subsequent to the sequential configuration of at least the selected subset of the one or more hidden layers of the DNN machine learning model, the output layer being configured based at least in part on a least square projection.
15 . The apparatus of claim 12 , wherein the output layer is a final projection layer that is configured subsequent to the sequential configuration of at least the selected subset of the one or more hidden layers of the DNN machine learning model, the output layer being configured based at least in part on maximizing the correlation measure with the target signal.
16 . The apparatus of claim 12 , wherein the correlation measure is a correlation ratio between (i) the layer output of the particular hidden layer in response to the given input signal, and (ii) the given target signal corresponding to the given input signal.
17 . The apparatus of claim 12 , wherein a nonlinear-model-based representation for a hidden layer comprises a plurality of block-oriented nonlinear models.
18 . The apparatus of claim 17 , wherein at least one of the plurality of block-oriented nonlinear models is a Hammerstein-Wiener model.
19 . The apparatus of claim 12 , wherein at least one hidden layer of the DNN machine learning model is substituted by a nonlinear mapping of the one or more input signals to a reproducing kernel Hilbert space (RKHS) where a linear weighting of a plurality of projections is configured by maximizing the correlation measure with the target signal.
20 . The apparatus of claim 12 , wherein the final projection layer is further configured using a combination of one or more outputs from the one or more hidden layers.

Description

REFERENCE TO RELATED APPLICATIONS This application claims priority to and the benefit of U.S. Provisional Application No. 63/280,505, filed on Nov. 17, 2021, the entire contents of which are incorporated herein by reference. GOVERNMENT SUPPORT This invention was made with government support under N00014-21-1-2345 awarded by The US Navy Office of Naval Research and under FA9453-18-1-0039 awarded by the US Air Force Research Laboratory. The government has certain rights in the invention. TECHNOLOGICAL FIELD Embodiments of the present disclosure generally relate to configuration (e.g., training) of machine learning models, for example, deep machine learning models or deep neural network (DNN) machine learning models. BACKGROUND Various embodiments of the present disclosure address technical challenges relating to efficiency, accuracy and optimality of existing methods for training DNN machine learning models, such as backpropagation and the use of mean-square error. BRIEF SUMMARY Various embodiments of the present disclosure are directed to improved configuration or training of DNN machine learning models. In particular, various embodiments provide a modularized configuration framework or process for training a DNN machine learning model that preserves or improves accuracy of the DNN machine learning model due to high-resolution control and transparency. Various embodiments provided herein improve upon existing processes and frameworks for configuring a DNN machine learning model. For example, backpropagation is understood by those of skill in the field of the present disclosure as a standard methodology of training DNN machine learning models and involves tuning of parameters of layers of a DNN machine learning model directly from data in supervised training. However, backpropagation introduces various weaknesses, such as simultaneous and non-specific training of all layers of a DNN machine learning model, non-guaranteed optimality, slow convergence, and low explainability, for example. Accordingly, various embodiments described herein provide a modularized configuration framework for training a DNN machine learning model that provides various technical advantages over existing training processes. In doing so, various embodiments involve modularization and individual configuration of different layers of the DNN machine learning model. Various embodiments additionally involve determination of correlation measures in order to individually configure a particular layer of the DNN machine learning model, which reduces overall computational complexity, enables greater explainability, and provides improved convergence during training of the DNN machine learning model. In general, according to one aspect, embodiments of the present invention feature a computer-implemented method for configuring a deep neural network (DNN) machine learning model comprising one or more hidden layers and an output layer, the various steps of the method being performed using a processor. One or more input signals and one or more target signals each corresponding to an input signal are received. A nonlinear-model-based representation for each hidden layer of the DNN machine learning model is selected. At least a selected subset of the one or more hidden layers of the DNN machine learning model are sequentially configured. Here, a particular hidden layer is independently configured, before configuring subsequent hidden layers of the selected subset, by constructing a correlation measure based at least in part on (i) a layer output of the particular hidden layer in response to a given input signal, and (ii) a given target signal corresponding to the given input signal, modifying one or more parameters of the nonlinear-model-based representation for the particular hidden layer based at least in part on maximizing the correlation measure, and fixing the one or more modified parameters of the nonlinear-model-based representation for the particular hidden layer. The performance of one or more automated tasks using the DNN machine learning model is initiated. In some embodiments, sequentially configuring the one or more hidden layers comprises fixing the modified parameters of the nonlinear-model-based representation for the particular hidden layer before modifying parameters of a nonlinear-model-based representation for a subsequent hidden layer. In one example, the output layer may be a final projection layer that is configured subsequent to the sequential configuration of at least the selected subset of the one or more hidden layers of the DNN machine learning model, the output layer being configured based at least in part on a least square projection. In another example, the output layer may be a final projection layer that is configured subsequent to the sequential configuration of at least the selected subset of the one or more hidden layers of the DNN machine learning model, the output layer being configured based at least in part on maxi