CN-122021883-A - Diffusion model space-time mixing precision quantization method and system based on tree search
Abstract
The application provides a tree search-based diffusion model space-time mixing precision quantization method and system, wherein the quantization method comprises the steps of constructing a space search tree, aggregating the space search tree to obtain static weight precision and reference activation precision of a DiT model, discretizing time dimension of a DiT model to generate a key time period, obtaining average activation bit numbers and accumulated distortion of each key time period based on the reference activation precision, constructing a time search tree, aggregating the time search tree by utilizing the average activation bit numbers and the accumulated distortion to obtain an optimal scheduling path of the DiT model, and carrying out quantization processing on a pre-trained DiT model based on the static weight precision and the optimal scheduling path to obtain a quantized DiT model. The quantization DiT model can be applied to the generation scenes of data such as images, videos and the like, and can realize the minimization of the distortion of the data under the condition of extremely low bit number budget through the highly unified logic of space dimension and time.
Inventors
- ZHANG YULUN
- YANG KAICHENG
- ZHANG XUN
- KONG LINGHE
- YANG XIAOKANG
Assignees
- 上海交通大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260114
Claims (10)
- 1. A diffusion model space-time mixing precision quantization method based on tree search is characterized by comprising the following steps: Constructing a space search tree, and aggregating the space search tree to obtain static weight precision and reference activation precision of DiT models, wherein the DiT models are used for executing image or video generation tasks; Discretizing the time dimension of DiT model to generate key time period, based on the reference activation precision, obtaining average activation bit number and accumulated distortion of each key time period; constructing a time search tree, and aggregating the time search tree by using the average activated bit number and the accumulated distortion to obtain an optimal scheduling path of DiT model; and carrying out quantization processing on the pre-trained DiT model based on the static weight precision and the optimal scheduling path to obtain a quantized DiT model.
- 2. The tree search-based diffusion model space-time mixing precision quantization method of claim 1, wherein the constructing a spatial search tree, aggregating the spatial search tree to obtain a static weight precision and a reference activation precision of DiT models, comprises: Constructing a candidate configuration set and a space search tree, wherein the candidate configuration set comprises a plurality of groups of quantization configurations, each group of quantization configurations comprises weight quantization precision and activation quantization precision, and each leaf node in the space search tree corresponds to one network layer in a DiT model; Traversing all quantization configurations in the candidate configuration set, respectively calculating corresponding average bits and quantization errors of the leaf node under each quantization configuration, constructing a space pareto queue of the leaf node, wherein the space pareto queue comprises a plurality of quantization units, each quantization unit comprises the quantization configuration and the corresponding average bits and quantization errors, and storing the constructed space pareto queue in the corresponding leaf node; aggregating the space search tree to obtain a space pareto queue of the space search tree root node; and selecting a quantization configuration corresponding to the space pareto queue of the space searching tree root node, wherein the quantization configuration is used as a reference quantization configuration when the average bit is smaller than a preset weight bit threshold value and the quantization error is minimum, setting the weight quantization precision in the reference quantization configuration as static weight precision, and setting the activation quantization precision in the reference quantization configuration as reference activation precision.
- 3. The method for quantizing space-time mixing precision of a diffusion model based on tree search according to claim 2, wherein the step of aggregating the spatial search tree to obtain a spatial pareto queue of a root node of the spatial search tree comprises the steps of: Taking the level of the leaf node in the space search tree as the current level, and iteratively executing the following operations: creating a new previous level above the current level; Traversing all nodes in the current level according to a topological sequence, sequentially selecting each two adjacent nodes as sub-nodes, and carrying out Cartesian product combination on space pareto queues stored by the two selected sub-nodes to generate a combined quantization queue, wherein the combined quantization queue comprises all possible quantization configuration combinations, and each quantization configuration combination forms a quantization unit; performing space pareto pruning on the combined quantization queue, screening a Top-K quantization unit positioned at the front edge of pareto, and taking the Top-K quantization unit as a new space pareto queue; constructing a father node corresponding to the two child nodes, storing a new space pareto queue in the father node, and adding the father node into the previous level; and stopping iteration until the current level only contains one node, taking the node as a root node of a space search tree, and acquiring a space pareto queue of the root node.
- 4. The method for quantizing the spatial-temporal mixing precision of the diffusion model based on the tree search according to claim 1, wherein the discretizing the time dimension of the DiT model to generate the key time periods, based on the reference activation precision, obtains the average number of activation bits and the cumulative distortion of each key time period, comprises: Dividing the total reasoning steps of DiT model reasoning process into a plurality of non-overlapping key time periods; For each critical time period, determining an actual activation accuracy of the critical time period based on the reference activation accuracy, the actual activation accuracy expressed as follows: A real,j = A base + δ j ,δ j ∈Δ{-1,0,+1} Wherein A real,j is the actual activation precision of the jth key time period, A base is the reference activation precision, delta j is the precision offset of the jth key time period, delta is the precision offset set; Traversing all precision offsets in the precision offset set, determining corresponding actual activation precision according to the precision offsets, and calculating corresponding average activation bit numbers and accumulated distortion when the DiT model performs reasoning with the actual activation precision in the key time period.
- 5. The tree search-based diffusion model space-time mixing precision quantization method according to claim 1, wherein the constructing a time search tree, aggregating the time search tree by using the average number of active bits and accumulated distortion, to obtain an optimal scheduling path of DiT model, comprises: Constructing a time search tree, wherein each leaf node in the time search tree corresponds to one key time period; Constructing a time pareto queue of each leaf node in the time search tree, wherein the time pareto queue comprises a plurality of time units, each time unit comprises an accuracy offset, the corresponding average activation bit number and accumulated distortion, and the constructed time pareto queue is stored in the corresponding leaf node; aggregating the time search tree to obtain a time pareto queue of the time search tree root node; and screening out the precision offset corresponding to the time pareto queue of the time searching tree root node, wherein the average activation bit number is smaller than a preset activation bit threshold value and the accumulated distortion is the smallest, and taking the precision offset as an optimal scheduling path.
- 6. The tree search-based diffusion model space-time mixing precision quantization method according to claim 5, wherein the time search tree is aggregated to obtain a time pareto queue of a root node of the time search tree, and the method comprises the steps of: taking the level of the leaf node in the time search tree as the current level, and iteratively executing the following operations: creating a new previous level above the current level; Traversing all nodes in a current level according to a topological sequence, sequentially selecting each two adjacent nodes as sub-nodes, and carrying out Cartesian integration on time pareto queues stored by the selected two sub-nodes to generate a combined time queue, wherein the combined time queue comprises all possible precision offset combinations, and each precision offset combination forms a time unit; performing time pareto pruning on the combined time queue, screening a Top-K time unit positioned at the pareto front edge, and taking the Top-K time unit as a new time pareto queue; constructing a father node corresponding to the two child nodes, storing a new time pareto queue in the father node, and adding the father node into the previous level; And stopping iteration until the current level only contains one node, taking the node as a root node of a time search tree, and acquiring a time pareto queue of the root node.
- 7. The tree search-based diffusion model space-time mixing precision quantization method according to claim 5, wherein the quantization processing of the pre-trained DiT model based on the static weight precision and the optimal scheduling path comprises: in the reasoning process of the pre-trained DiT model, adopting static weight precision as fixed weight precision of the pre-trained DiT model; And aiming at each key time period, acquiring the precision offset corresponding to the key time period in the optimal scheduling path, taking the sum of the reference activation precision and the corresponding precision offset as the application activation precision of the key time period, and quantifying the activation value of the pre-trained DiT model by adopting the application activation precision to realize the quantification processing of the pre-trained DiT model.
- 8. A diffusion model space-time mixing precision quantization system based on tree search is characterized by comprising: the space searching module is used for constructing a space searching tree, and determining DiT static weight precision and reference activation precision of the model based on the space searching tree; The time segmentation model is used for discretizing the time dimension of the DiT model to generate key time periods, and obtaining the average activation bit number and the accumulated distortion of each key time period based on the reference activation precision; The time search module is used for constructing a time search tree, and aggregating the time search tree by utilizing the average activation bit number and the accumulated distortion to obtain an optimal scheduling path of DiT model; And the quantization module is used for carrying out quantization processing on the pre-trained DiT model based on the static weight precision and the optimal scheduling path to obtain a quantized DiT model.
- 9. An image generation method, comprising: determining a pre-trained FLUX meristematic graph model; Performing quantization processing on the pre-trained FLUX venturi graph model by adopting the tree search-based diffusion model space-time mixing precision quantization method according to any one of claims 1 to 7, and determining a quantized FLUX venturi graph model; Inputting preset image generation text into the quantized FLUX text generation graph model, and determining the generated image.
- 10. A video generation method, comprising: determining a pre-trained Wan video generation model; performing quantization processing on the pre-trained Wan video generation model by using the diffusion model space-time mixing precision quantization method based on tree search in any one of claims 1 to 7, and determining a quantized Wan video generation model; and inputting a preset video generation text into the quantized Wan video generation model, and determining the generated video.
Description
Diffusion model space-time mixing precision quantization method and system based on tree search Technical Field The application relates to the technical field of deep learning model compression and automatic machine learning (AutoML), in particular to a diffusion model space-time mixing precision quantization method and system based on tree search. Background Hybrid precision quantization (MPQ) is a key technique to reduce the cost of diffusion model reasoning. However, existing quantization search methods for the diffusion transformer (Diffusion Transformer, diT) model have the following obvious dimension splitting problems: 1. the space searching efficiency is low, diT model layers are numerous, and the traditional searching space is exponentially exploded. Although there are methods for solving the problem of searching at the spatial level based on integer programming, genetic algorithm, etc., the solution space cannot be fully explored, resulting in a low upper limit of mixing accuracy. 2. The time dimension strategy is coarse, and the generation process of the diffusion model comprises tens to hundreds of time steps (TIME STEPS). The existing method either uses the same set of quantization configuration (static quantization) for all time steps, which causes the waste of calculation force, or uses greedy algorithm or simple heuristic rule to distribute the time step precision, which can not ensure global optimum, and is difficult to balance the constraint of 'image quality' and 'average bit number'. According to the technical literature search, the Chinese patent with the publication number of CN117892792A provides a diffusion model mixing precision quantization method for generating images, and the application is used for accelerating the generation of a diffusion model more reasonably and efficiently by distributing quantization bit widths to different layers of the model according to the sensitivity of the different layers of the model to quantization from the model quantization perspective. However, the above method cannot unify time and space to achieve a high degree of unification of the algorithm logic. Therefore, there is a need for a method for uniformly quantizing the mixed precision of diffusion models in two dimensions, namely time and space, so as to minimize distortion. Disclosure of Invention Aiming at the defects in the prior art, the application aims to provide a diffusion model space-time mixing precision quantization method and system based on tree search. According to a first aspect of the present application, there is provided a tree search-based diffusion model spatio-temporal mixing precision quantization method, comprising: Constructing a space search tree, and aggregating the space search tree to obtain static weight precision and reference activation precision of DiT models, wherein the DiT models are used for executing image or video generation tasks; Discretizing the time dimension of DiT model to generate key time period, based on the reference activation precision, obtaining average activation bit number and accumulated distortion of each key time period; constructing a time search tree, and aggregating the time search tree by using the average activated bit number and the accumulated distortion to obtain an optimal scheduling path of DiT model; and carrying out quantization processing on the pre-trained DiT model based on the static weight precision and the optimal scheduling path to obtain a quantized DiT model. Optionally, the constructing a spatial search tree, and aggregating the spatial search tree to obtain static weight precision and reference activation precision of DiT models, including: Constructing a candidate configuration set and a space search tree, wherein the candidate configuration set comprises a plurality of groups of quantization configurations, each group of quantization configurations comprises weight quantization precision and activation quantization precision, and each leaf node in the space search tree corresponds to one network layer in a DiT model; Traversing all quantization in the candidate configuration set, respectively calculating the corresponding average bit and quantization error of the leaf node under each quantization configuration, constructing a space pareto queue of the leaf node, wherein the space pareto queue comprises a plurality of quantization units, each quantization unit comprises the quantization configuration and the corresponding average bit and quantization error, and storing the constructed space pareto queue in the corresponding leaf node; aggregating the space search tree to obtain a space pareto queue of the space search tree root node; and selecting a quantization configuration corresponding to the space pareto queue of the space searching tree root node, wherein the quantization configuration is used as a reference quantization configuration when the average bit is smaller than a preset weight bit threshold value and