US-12626105-B2 - Stochastic noise layers

US12626105B2US 12626105 B2US12626105 B2US 12626105B2US-12626105-B2

Abstract

Provided is a process including: obtaining, with a computer system, with a stochastic layer of a multi-layer neural network, inputs to the stochastic layer from, wherein the multi-layer neural network comprises both deterministic layers and the stochastic layer, and the stochastic layer comprises a plurality of parameters that vary stochastically according to respective probability distributions; determining values of the plurality of parameters by randomly sampling from the statistical distributions; determining an output of the stochastic layer based on both the determined values of the plurality of parameters and the inputs to the stochastic layer; and providing the output of the stochastic layer to a downstream layer of the multi-layer neural network or as an output of the multi-layer neural network.

Inventors

Hadi Esmaeilzadeh
ANWESA CHOUDHURI

Assignees

Protopia AI, Inc.

Dates

Publication Date: 20260512
Application Date: 20220224

Claims (20)

1 . A tangible, non-transitory, machine-readable medium storing instructions that when executed by one or more processors effectuate operations comprising: obtaining, with a computer system, inputs to a stochastic layer of a multi-layer neural network, wherein: the multi-layer neural network comprises both deterministic layers and the stochastic layer, and the stochastic layer comprising a plurality of parameters that vary stochastically according to respective probability distributions, wherein the respective probability distributions are each characterized by one or more statistical parameters, and wherein the one or more statistical parameters are learned with gradient descent based on an objective function, the objective function being at least partially differentiable with respect to the one or more statistical parameters and the objective function being based on output of the multi-layer neural network, and; determining, with the computer system, values of the plurality of parameters of the stochastic layer by randomly sampling from the respective probability distributions; determining, with the computer system, outputs of the stochastic layer based on both the determined values of the plurality of parameters and the inputs to the stochastic layer, the inputs to the stochastic layer being different from the determined values of the plurality of parameters and the inputs to the stochastic layer being different from the one or more statistical parameters; and providing, with the computer system, the output of the stochastic layer to a downstream layer of the multi-layer neural network or as an output of the multi-layer neural network.
2 . The medium of claim 1 , the operations further comprising: performing a vulnerability analysis for the multi-layer neural network based on the respective probability distributions, wherein the vulnerability analysis measures relationships between sizes of the one or more statistical parameters of the respective probability distributions, and performance of the multi-layer neural network, and wherein the respective probability distributions correspond to components of the input to the stochastic layer.
3 . The medium of claim 2 , wherein the component comprises a component of at least one of the following: a matrix, a tensor, a vector, a scalar, an embedding, an encoding, a pixel value, and an activation value.
4 . The medium of claim 1 , the operations further comprising: determining that a given one or more statistical parameter characterizing dispersion for its respective probability distributions exceeds a threshold and, in response, pruning or making constant a corresponding perceptron in the multi-layer neural network to compress the multi-layer neural network.
5 . The medium of claim 4 , wherein pruning or making constant a corresponding perception comprises pruning or making constant a corresponding perceptron in a deterministic layer of the multi-layer neural network.
6 . The medium of claim 1 , wherein: the stochastic layer comprises one or more convolutional kernels; the plurality of parameters comprise weights of the convolutional kernels, each weight corresponding to a different respective probability distribution; and the respective probability distributions are at least one of a normal distribution, a Gaussian distribution, a Laplacian distribution, a binomial distribution, a multinomial distribution, or a combination thereof.
7 . The medium of claim 1 , the operations further comprising: obtaining a deterministic, trained version of the multi-layer neural network; designating a subset of layers of the deterministic, trained version of the multi-layer neural network to be transformed into stochastic layers, the subset include a plurality of layers; and determining the values of the plurality of parameters for the respective probability distributions for the subset of layers transformed into the stochastic layers.
8 . The medium of claim 7 , wherein determining values of the plurality of parameters comprises determining a maximum dispersion for the probability distributions such that error of multi-layer neural network with the stochastic layers is minimized.
9 . The medium of claim 1 , wherein to the objective function is determined to minimize cross entropy relative to a deterministic version of the multi-layer neural network.
10 . The medium of claim 1 , the operations further comprising: changing, between instances in which the multi-layer neural network responds to inputs, a first subset of layers of the multi-layer neural network from being deterministic to being stochastic and a second subset of layers of the multi-layer neural network from being stochastic to being deterministic, such that different layers are stochastic when responding to different ones of the inputs.
11 . The medium of claim 1 , wherein the operations comprise: steps for learning probability distributions of the stochastic layer.
12 . The medium of claim 1 , wherein the operations comprise: steps for applying the stochastic layer.
13 . A tangible, non-transitory, machine-readable medium storing instructions that when executed by one or more processors effectuate operations comprising: obtaining, with a computer system, inputs to a stochastic layer of a multi-layer neural network, wherein: the multi-layer neural network comprises both deterministic layers and the stochastic layer, and the stochastic layer is configured to: determine intermediate output values based on the inputs to the stochastic layer and based on random sampling from probability distributions of the stochastic layer, wherein statistical parameters of the probability distributions of the stochastic layer are static outside of training, and wherein the statistical parameters that are static outside of training are learned with gradient descent based on an objective function, the objective function being at least partially differentiable with respect to the statistical parameters and the objective function being based on output of the multi-layer neural network; determining, with the computer system, an output of the stochastic layer based on the inputs and by randomly sampling from the probability distributions having the determined statistical parameters; providing the determined output of the stochastic layer to a downstream layer of the multi-layer neural network or as an output of the multi-layer neural network.
14 . The medium of claim 13 , the operations further comprising: determining the statistical parameters of the respective probability distributions based on at least one of a stochastic gradient descent, back propagation, or a combination thereof.
15 . The medium of claim 13 , the operations further comprising: determining a measure of protection for the multi-layer neural network based on size of the statistical parameters of the respective probability distributions.
16 . The medium of claim 13 , operations further comprising: determining a measure of vulnerability for the multi-layer neural network based on size of the statistical parameters of the respective probability distributions.
17 . The medium of claim 13 , the operations further comprising: determining a threshold for the statistical parameters of the respective probability distributions based on a threshold for performance of the multi-layer neural network; and pruning or making constant nodes in the multi-layer neural network for which the statistical parameters exceed the threshold to compress the multi-layer neural network.
18 . The medium of claim 13 , wherein the statistical parameters corresponds to convolutional kernels.
19 . The medium of claim 13 , wherein statistical parameters corresponds to weights.
20 . The medium of claim 13 , wherein the probability distributions are based on Laplacian distributions.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Pat. App. 63/227,846, titled STOCHASTIC LAYERS, filed 30 Jul. 2021, and claims the benefit of U.S. Provisional Pat. App. 63/153,284, titled METHODS AND SYSTEMS FOR SPECIALIZING DATASETS FOR TRAINING/VALIDATION OF MACHINE LEARNING, filed 24 Feb. 2021, the entire content of each of which is hereby incorporated by reference. BACKGROUND Machine learning models, including neural networks, have become the backbone of intelligent services and smart devices, such as smart security cameras or voice assistants. To operate, the machine learning models may process input data and generated output data based on transformation occurring in one or more layers of the models, at least in the case of deep neural network machine learning models. SUMMARY The following is a non-exhaustive listing of some aspects of the present techniques. These and other aspects are described in the following disclosure. Some aspects include application of a stochastic layer within a machine learning model. Some aspects include application of a stochastic weighting within a machine learning model. Some aspects include optimization of stochastic noise for defense of a machine learning model. Some aspects include vulnerability analysis for a machine learning model. Some aspects include a tangible, non-transitory, machine-readable medium storing instructions that when executed by a data processing apparatus cause the data processing apparatus to perform operations including the above-mentioned application. Some aspects include a system, including: one or more processors; and memory storing instructions that when executed by the processors cause the processors to effectuate operations of the above-mentioned application. BRIEF DESCRIPTION OF THE DRAWINGS The above-mentioned aspects and other aspects of the present techniques will be better understood when the present application is read in view of the following figures in which like numbers indicate similar or identical elements: FIG. 1 depicts an example machine learning model including a stochastic layer, in accordance with some embodiments; FIG. 2 depicts a convolutional stochastic layer with deterministic convolutional kernels, in accordance with some embodiments; FIG. 3 depicts a stochastic layer with stochastic convolutional kernels, in accordance with some embodiments; FIG. 4 illustrates an exemplary method for application of a stochastic layer to a machine learning model, according to some embodiments; FIG. 5 illustrates a machine learning model with stochastic layer weights, according to some embodiments; FIG. 6 shows an example computing system that uses a stochastic layer in a machine learning model, in accordance with some embodiments; FIG. 7 shows an example machine-learning model that may use one or more stochastic layers; and FIG. 8 shows an example computing device that may be used to implement some embodiments. While the present techniques are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. The drawings may not be to scale. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the present techniques to the particular form disclosed, but to the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present techniques as defined by the appended claims. DETAILED DESCRIPTION To mitigate the problems described herein, the inventors had to both invent solutions and, in some cases just as importantly, recognize problems overlooked (or not yet foreseen) by others in the fields of machine learning and computer science. Indeed, the inventors wish to emphasize the difficulty of recognizing those problems that are nascent and will become much more apparent in the future should trends in industry continue as the inventors expect. Further, because multiple problems are addressed, it should be understood that some embodiments are problem-specific, and not all embodiments address every problem with traditional systems described herein or provide every benefit described herein. That said, improvements that solve various permutations of these problems are described below. In operation, machine learning models (also referred to as just models) can be subject to model exfiltration attacks, whereby a threat actor systematically probes the model and attempts to infer model architecture and parameter values or develop a training set to train another model to approximate the existing model. In some instances, observation of the relationship between input and output can allow a malicious actor to predict a model's behavior or reconstruct at least a part of the model, often even without access to the model's parameters and only the ability to provide inputs and obs