CN-122021768-A - Edge multi-mode large model light weight optimization method

CN122021768ACN 122021768 ACN122021768 ACN 122021768ACN-122021768-A

Abstract

The application discloses an edge multi-mode large model light-weight optimization method which comprises the steps of analyzing a pre-training original model, separating a mode core feature branch and a redundant cross-mode module, generating a mode association optimization kernel by a pruning weak association unit, monitoring the real-time state of equipment based on the kernel, generating a hardware adaptation light-weight intermediate structure by parameter dynamic quantization and non-necessary network layer cutting, configuring a mode activation threshold, resident core modes, activating other modes as required and configuring quick loading logic, generating a mode activation light-weight map as required, deploying the map, collecting operation data, and adjusting key parameters to realize the dynamic balance of model performance and hardware load. The method obviously reduces the occupation of model resources and improves the operation efficiency and response speed.

Inventors

WEI YUNBIN
LI HAIYANG
XU JINBO
ZHANG YIZHANG
LI HUAWEI

Assignees

北京甲板智慧科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. The lightweight optimization method for the edge multi-mode large model is characterized by comprising the following steps of: analyzing a pre-training original multi-mode large model, separating a mode core feature branch and a redundant cross-mode module from the model, analyzing mode association strength and pruning a weak association unit, retaining core features and strong association fusion logic, and generating a mode association optimization kernel; step 2, optimizing a kernel structure based on modal association, monitoring equipment computing power, storage and energy consumption states through an edge hardware computing power dynamic adaptation frame, and generating a hardware adaptation lightweight intermediate structure through parameter dynamic quantization and non-essential network layer cutting; step 3, configuring an activation threshold value by adopting a mode on-demand activation-dynamic switching architecture based on a hardware adaptation lightweight intermediate architecture, resident in a core mode, activating other modes on demand and configuring a quick loading logic, and generating a mode on-demand activation lightweight map; And 4, activating the lightweight spectrum as required in the edge equipment deployment mode, collecting operation data, and adjusting pruning threshold, quantization strategy and activation threshold to implement dynamic balance of model performance and hardware load.
2. The edge multi-modal large model lightweight optimization method according to claim 1, wherein step 1 comprises: Step 11, performing modal branch decoupling on the pre-training original multi-modal large model, and separating each modal core feature extraction branch from a cross-modal fusion module to generate a modal decoupling initial framework; Step 12, performing core feature distillation on the modal decoupling initial framework, extracting branch key feature weights, and generating modal core feature weight tensors; and 13, quantifying cross-mode association strength and pruning weak association units, and fusing a mode core feature weight tensor to generate a mode association optimization kernel, wherein the mode association optimization kernel binds the core mode features with strong association fusion logic.
3. The edge multi-modal large model lightweight optimization method according to claim 2, wherein step 11 includes: step 111, analyzing the network topology of the original model, marking the boundaries of each modal feature branch and the cross-modal module, and generating a modal architecture topology map; step 112, cutting off unnecessary connection based on a topology map of the modal structure to generate a modal decoupling initial structure, wherein the module boundary of the modal decoupling initial structure is required to be consistent with the topology map.
4. The edge multi-modal large model lightweight optimization method of claim 2, wherein step 12 includes: Step 121, constructing a distillation loss function, and decoupling branch characteristics of an initial framework by taking the characteristic recognition accuracy as a target training mode to generate a post-distillation characteristic weight; And 122, eliminating redundancy of the distilled feature weights and integrating dimensions to generate a modal core feature weight tensor, wherein the dimensions of the modal core feature weight tensor are suitable for computing power of edge equipment.
5. The edge multi-modal large model lightweight optimization method as claimed in claim 2, wherein step 13 includes: Step 131, constructing a cross-modal association intensity quantization matrix, calculating the association intensity value of each unit, and generating an association intensity quantization table; Step 132, generating a post-pruning fusion module according to weak association units marked in a preset threshold pruning association strength quantization table, wherein the post-pruning fusion module needs to keep strong association calculation logic; And step 133, fusing the modal core feature weight tensor with the pruning post-fusion module to generate a modal associated optimization kernel, wherein the calculated amount of the modal associated optimization kernel is reduced to a preset lightweight threshold value compared with the original model.
6. The edge multi-modal large model lightweight optimization method according to claim 1, wherein step 2 comprises: Step 21, deploying hardware state acquisition nodes, monitoring calculation force, storage and energy consumption data of equipment in real time, and generating an edge hardware state real-parameter tensor, wherein the edge hardware state real-parameter tensor needs to reflect real-time load of the equipment; step 22, starting a parameter dynamic quantization mechanism based on the edge hardware state real parameter tensor, and adjusting parameter precision according to a load level to generate a precision adaptation characteristic tensor, wherein the precision level of the precision adaptation characteristic tensor is strongly related to equipment load; and step 23, cutting unnecessary network layers corresponding to the precision adaptation feature tensor, and integrating to generate a hardware adaptation light-weight intermediaries, wherein the storage of the hardware adaptation light-weight intermediaries occupies the residual capacity of the matching equipment.
7. The edge multi-modal large model lightweight optimization method as set forth in claim 6, wherein step 21 includes: Step 211, embedding a data acquisition interface into a device calculation, storage and energy consumption monitoring module, setting an acquisition period and a reporting rule, and generating original hardware state data, wherein the original hardware state data needs to cover a full monitoring dimension; 212, carrying out format normalization and anomaly filtering on the original hardware state data to generate standardized hardware state data, wherein the standardized hardware state data needs to eliminate data format differences; And 213, performing tensor mapping on the standardized hardware state data according to the monitoring dimension to generate an edge hardware state real parameter tensor, wherein the time sequence dimension of the edge hardware state real parameter tensor is consistent with the acquisition period.
8. The edge multi-modal large model lightweight optimization method as set forth in claim 6, wherein step 22 includes: Step 221, dividing the load level of the edge hardware, setting parameter quantization precision intervals corresponding to the levels, and generating a load-precision mapping table, wherein the load-precision mapping table is required to be adapted to the upper limit of the computing power of the edge equipment; Step 222, matching the load data of the edge hardware state real parameter tensor to the corresponding grade of the load-precision mapping table, and calling a preset quantization precision interval; And 223, associating and optimizing parameters of the kernel structure according to the called precision interval quantization mode to generate a precision adaptation characteristic tensor, wherein the precision of the precision adaptation characteristic tensor is dynamically adjusted along with the load of the equipment.
9. The edge multi-modal large model lightweight optimization method according to claim 1, wherein step 3 comprises: step 31, configuring an activation threshold value for each mode based on a hardware adaptation lightweight intermediate structure, setting a core mode resident operation logic, and generating a mode activation rule tensor, wherein the mode activation rule tensor needs to define a mode activation triggering condition; Step 32, constructing a mode switching parameter fast loading buffer pool, pre-storing key switching parameters of each mode, and generating a mode switching buffer tensor, wherein the mode switching buffer tensor needs to support millisecond-level parameter calling; and 33, merging the mode activation rule tensor and the mode switching cache tensor to generate a mode on-demand activation lightweight spectrum, wherein the mode on-demand activation lightweight spectrum supports the mode rapid activation and switching.
10. The edge multi-modal large model lightweight optimization method according to claim 1, wherein step 4 comprises: Step 41, activating a light-weight map on demand in an edge equipment deployment mode, collecting calculation power occupancy rate, response delay, task completion accuracy and mode switching efficiency data, and generating a light-weight optimized operation feedback tensor, wherein the light-weight optimized operation feedback tensor needs to cover performance and load double-dimension indexes; And 42, inputting the lightweight optimization operation feedback tensor into a closed loop iteration adaptation mechanism, adjusting a pruning threshold value of a modal core feature distillation-association pruning framework, a parameter quantization strategy of an edge hardware calculation dynamic adaptation framework and an activation threshold value of a modal on-demand activation-dynamic switching framework, and realizing dynamic balance of model performance and hardware load.

Description

Edge multi-mode large model light weight optimization method Technical Field The application relates to the technical field of artificial intelligent model optimization, in particular to an edge multi-mode large model lightweight optimization method. Background In a plurality of fields such as intelligent perception, man-machine interaction, automatic driving, intelligent medical treatment and the like, the multi-mode large model can realize deep understanding and reasoning of a complex environment by means of the capability of fusing multi-mode information such as texts, images and audios, and the application demands are growing increasingly. Particularly in an edge computing scene, the localization processing requirement of edge equipment is urgent, so that communication delay caused by data uploading to the cloud can be avoided, privacy leakage risk can be reduced, and real-time performance and safety requirements can be better met. According to the existing edge multi-mode model deployment scheme, an original multi-mode model is compressed by adopting a pruning technology with unified sparsification standards, floating point parameters are converted into a low-precision format so as to reduce storage overhead, and all mode branches are kept in an operation state all the time so as to meet various task demands. According to the scheme, partial neurons and layer structures are removed through pruning, then parameter quantization processing is carried out, finally, an optimized model is deployed on edge equipment, and multi-mode task reasoning is completed depending on a fixed model structure. However, this prior art solution has obvious technical drawbacks. Because unified sparse pruning standard is adopted, correlation characteristic differences among multiple modes cannot be considered, a correlation cross-mode fusion unit is easy to cut by mistake, meanwhile, a fixed model structure cannot adapt to calculation power, storage and energy consumption states of dynamic changes of edge equipment, and a full-mode resident operation mode can cause a large amount of invalid calculation power consumption, so that balance of resource occupation and task performance is difficult to be considered when the model operates on the edge equipment, and application potential of the edge equipment cannot be fully exerted. Disclosure of Invention In order to solve the technical problems, the application provides a lightweight optimization method for an edge multi-mode large model, which is used for at least alleviating the technical problems. The technical scheme provided by the embodiment of the application is as follows: A method for optimizing the light weight of an edge multi-mode big model comprises the steps of 1, analyzing a pre-training original multi-mode big model, separating a mode core feature branch and a redundant cross-mode module from the model core feature branch and redundant cross-mode module, analyzing mode association strength and pruning a weak association unit, retaining core features and strong association fusion logic, generating a mode association optimization kernel structure, 2, monitoring equipment calculation force, storage and energy consumption states through an edge hardware calculation force dynamic adaptation framework based on the mode association optimization kernel structure, generating a hardware adaptation light weight intermediate structure through parameter dynamic quantification and non-necessary network layer cutting, 3, configuring an activation threshold value by adopting a mode on-demand activation-dynamic switching framework based on the hardware adaptation light weight intermediate structure, resident core modes and activating other modes on demand and configuring quick loading logic to generate a mode on-demand activation light weight map, 4, deploying the mode on-demand activation light weight map at the edge equipment, collecting operation data, and adjusting the pruning threshold value, the quantification strategy and the activation threshold value to realize dynamic balance of model performance and hardware load. The technical scheme of the application has the following advantages: In the model structure optimization level, in the step 1, an original multi-mode large model is pre-trained through analysis, a mode core feature branch and a redundant cross-mode module are separated, mode association strength is analyzed, a weak association unit is pruned, and core feature and strong association fusion logic is reserved. Different from the traditional unified sparse pruning scheme, the method focuses on the correlation characteristic difference among modes, avoids the problem that a strong correlation cross-mode fusion unit is cut by mistake, reduces redundant calculation, and simultaneously ensures the semantic understanding capability of a model core, so that the generated mode correlation optimization kernel structure has the light weight characteristic, can maintain hig