US-20260127435-A1 - COMPRESSING MACHINE-LEARNING MODELS

US20260127435A1US 20260127435 A1US20260127435 A1US 20260127435A1US-20260127435-A1

Abstract

Systems and techniques are described herein for processing data. For instance, a method for processing data is provided. The method may include processing input data using a first merged layer of a machine-learning model to generate first features, wherein the first merged layer is based on a funnel layer and a first linear layer, and wherein the funnel layer is smaller in at least one dimension than the first linear layer; processing the processed input data using a non-linear layer of the machine-learning model to generate second features; and processing the second features using a second merged layer of the machine-learning model to generate an output, wherein the second merged layer is based on a reverse funnel layer and a second linear layer, and wherein the reverse funnel layer is smaller in at least one dimension than the second linear layer.

Inventors

Haitam BEN YAHIA
Denis KORZHENKOV
Amir GHODRATI
Amirhossein HABIBIAN

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260507
Application Date: 20241223

Claims (20)

1 . An apparatus for processing data, the apparatus comprising: at least one memory; and at least one processor coupled to the at least one memory and configured to: process input data using a first merged layer of a machine-learning model to generate first features, wherein the first merged layer is based on a funnel layer and a first linear layer, and wherein the funnel layer is smaller in at least one dimension than the first linear layer; process the processed input data using a non-linear layer of the machine-learning model to generate second features; and process the second features using a second merged layer of the machine-learning model to generate an output, wherein the second merged layer is based on a reverse funnel layer and a second linear layer, and wherein the reverse funnel layer is smaller in at least one dimension than the second linear layer.
2 . The apparatus of claim 1 , wherein the first merged layer is a product of the funnel layer and the first linear layer.
3 . The apparatus of claim 1 , wherein the second merged layer is a product of the reverse funnel layer and the second linear layer.
4 . The apparatus of claim 1 , wherein the funnel layer and the reverse funnel layer are trained together with the first linear layer and the second linear layer.
5 . The apparatus of claim 4 , wherein during training of the funnel layer and the reverse funnel layer, the first linear layer and the second linear layer are frozen.
6 . The apparatus of claim 1 , wherein the input data comprises an output from a previous layer of the machine-learning model.
7 . The apparatus of claim 1 , wherein the input data comprises an input image, a video frame, or input sensor data.
8 . An apparatus for compressing machine-learning models, the apparatus comprising: at least one memory; and at least one processor coupled to the at least one memory and configured to: add a funnel layer to a network of layers at an output or an input of a first linear layer of the network of layers; add a reverse-funnel layer to the network of layers at an input or an output of a second linear layer of the network of layers; train the network of layers to perform an operation; merge the funnel layer with the first linear layer; and merge the reverse-funnel layer with the second linear layer.
9 . The apparatus of claim 8 , wherein the at least one processor is configured to deploy the network of layers at a device.
10 . The apparatus of claim 8 , wherein the at least one processor is configured to perform the operation using the network of layers.
11 . The apparatus of claim 8 , wherein the at least one processor is configured to initialize the funnel layer and the reverse-funnel layer based on a singular value decomposition (SVD) of the first linear layer merged with the second linear layer.
12 . The apparatus of claim 8 , wherein the funnel layer is smaller than the first linear layer and the reverse-funnel layer is smaller than the second linear layer.
13 . The apparatus of claim 8 , wherein the funnel layer is smaller in a channels-out dimension than the first linear layer and the reverse-funnel layer is smaller in a channels-out dimension than the second linear layer.
14 . The apparatus of claim 8 , wherein the first linear layer comprises at least one of: an attention block; a feedforward blocks; or a convolution block.
15 . The apparatus of claim 8 , wherein the operation is associated with at least one of: video generation; video editing; video super resolution; or video inpainting.
16 . A method for processing data, the method comprising: processing input data using a first merged layer of a machine-learning model to generate first features, wherein the first merged layer is based on a funnel layer and a first linear layer, and wherein the funnel layer is smaller in at least one dimension than the first linear layer; processing the processed input data using a non-linear layer of the machine-learning model to generate second features; and processing the second features using a second merged layer of the machine-learning model to generate an output, wherein the second merged layer is based on a reverse funnel layer and a second linear layer, and wherein the reverse funnel layer is smaller in at least one dimension than the second linear layer.
17 . The method of claim 16 , wherein the first merged layer is a product of the funnel layer and the first linear layer.
18 . The method of claim 16 , wherein the second merged layer is a product of the reverse funnel layer and the second linear layer.
19 . The method of claim 16 , wherein the funnel layer and the reverse funnel layer are trained together with the first linear layer and the second linear layer.
20 . The method of claim 19 , wherein during training of the funnel layer and the reverse funnel layer, the first linear layer and the second linear layer are frozen.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application No. 63/717,685, filed Nov. 7, 2024, which is incorporated herein by reference in its entirety. TECHNICAL FIELD The present disclosure generally relates to machine-learning models. For example, aspects of the present disclosure include systems and techniques for compressing machine-learning models. BACKGROUND Image and video generative models may generally adhere to scaling laws, where performance enhances with increased model size and computational resources. Current generative models are constrained based on such models being 1) computationally expensive, requiring billions of floating-point operations per second (TFLOPS) of processing power and 2) memory demanding with parameter counts in the order of billions. Such constraints make it difficult to deployment of such models for on-device use cases. SUMMARY The following presents a simplified summary relating to one or more aspects disclosed herein. Thus, the following summary should not be considered an extensive overview relating to all contemplated aspects, nor should the following summary be considered to identify key or critical elements relating to all contemplated aspects or to delineate the scope associated with any particular aspect. Accordingly, the following summary presents certain concepts relating to one or more aspects relating to the mechanisms disclosed herein in a simplified form to precede the detailed description presented below. Systems and techniques are described for processing data. According to at least one example, a method is provided for processing data. The method includes: processing input data using a first merged layer of a machine-learning model to generate first features, wherein the first merged layer is based on a funnel layer and a first linear layer, and wherein the funnel layer is smaller in at least one dimension than the first linear layer; processing the processed input data using a non-linear layer of the machine-learning model to generate second features; and processing the second features using a second merged layer of the machine-learning model to generate an output, wherein the second merged layer is based on a reverse funnel layer and a second linear layer, and wherein the reverse funnel layer is smaller in at least one dimension than the second linear layer. In another example, an apparatus for processing data is provided that includes at least one memory and at least one processor (e.g., configured in circuitry) coupled to the at least one memory. The at least one processor configured to: process input data using a first merged layer of a machine-learning model to generate first features, wherein the first merged layer is based on a funnel layer and a first linear layer, and wherein the funnel layer is smaller in at least one dimension than the first linear layer; process the processed input data using a non-linear layer of the machine-learning model to generate second features; and process the second features using a second merged layer of the machine-learning model to generate an output, wherein the second merged layer is based on a reverse funnel layer and a second linear layer, and wherein the reverse funnel layer is smaller in at least one dimension than the second linear layer. In another example, a non-transitory computer-readable medium is provided that has stored thereon instructions that, when executed by one or more processors, cause the one or more processors to: process input data using a first merged layer of a machine-learning model to generate first features, wherein the first merged layer is based on a funnel layer and a first linear layer, and wherein the funnel layer is smaller in at least one dimension than the first linear layer; process the processed input data using a non-linear layer of the machine-learning model to generate second features; and process the second features using a second merged layer of the machine-learning model to generate an output, wherein the second merged layer is based on a reverse funnel layer and a second linear layer, and wherein the reverse funnel layer is smaller in at least one dimension than the second linear layer. In another example, an apparatus for processing data is provided. The apparatus includes: means for processing input data using a first merged layer of a machine-learning model to generate first features, wherein the first merged layer is based on a funnel layer and a first linear layer, and wherein the funnel layer is smaller in at least one dimension than the first linear layer; means for processing the processed input data using a non-linear layer of the machine-learning model to generate second features; and means for processing the second features using a second merged layer of the machine-learning model to generate an output, wherein the second merged layer is based on a reverse funnel layer and a second linear layer, and wherein