CN-122019101-A - Terminal side dynamic optimization method and device of vehicle-mounted multi-mode large model and electronic equipment

CN122019101ACN 122019101 ACN122019101 ACN 122019101ACN-122019101-A

Abstract

The invention provides an end side dynamic optimization method and device of a vehicle-mounted multi-mode large model and electronic equipment, wherein the method comprises the steps of carrying out light weight treatment on an original multi-mode large model to obtain a light weight model suitable for vehicle-mounted end side deployment; the method comprises the steps of processing vehicle-mounted multi-mode input data by using a lightweight model to execute an inference task, dynamically configuring inference calculation resources of the lightweight model based on task context of the inference task and running state of vehicle-mounted terminal side equipment, executing calculation through the lightweight model based on the dynamic configuration to generate intermediate calculation results, detecting and multiplexing the intermediate calculation results, and adaptively updating the lightweight model based on local feedback of executing the inference task and vehicle-mounted environment information. In the method, the power consumption and delay are obviously reduced while the precision of the key task is ensured through dynamic quantization and heterogeneous computation scheduling, and the efficient, real-time and robust reasoning of the end-side model is realized by combining the multiplexing of the intermediate result and the self-adaptive updating of the edge.

Inventors

GUO JIA
XUE QIANG
ZHOU LIN
CAO YANG
JIN XULONG
GAO ZELEI

Assignees

一汽奔腾汽车股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. An end-side dynamic optimization method of a vehicle-mounted multi-mode large model is characterized by comprising the following steps of: carrying out light weight treatment on the original multi-mode large model to obtain a light weight model suitable for vehicle-mounted end side deployment; processing the vehicle-mounted multi-mode input data by using the lightweight model to execute an reasoning task; In the process of executing the reasoning task, dynamically configuring the reasoning calculation resources of the lightweight model based on the task context of the reasoning task and the running state of the vehicle-mounted terminal side equipment; performing a calculation through the lightweight model based on the dynamic configuration, producing an intermediate calculation result; In a continuous reasoning task, detecting and multiplexing the intermediate calculation result; and carrying out self-adaptive updating on the lightweight model based on local feedback of the execution reasoning task and vehicle-mounted environment information.
2. The method of claim 1, wherein the performing the lightweight process on the original multi-modal large model to obtain a lightweight model suitable for vehicle-mounted end-side deployment comprises: decomposing the original multi-mode large model into a voice coding module, a visual coding module and a text decoding module; performing knowledge distillation on the voice coding module and the visual coding module, and mapping middle layer features of the voice coding module and the visual coding module to a shared feature space; replacing one or more of the speech encoding module, visual encoding module, and text decoding module in the original multi-modal large model with a lightweight network to generate the lightweight model.
3. The method of claim 1, wherein the dynamically configuring the inferred computing resources of the lightweight model comprises: Dynamically selecting and applying a corresponding quantization accuracy level for the lightweight model based on the security level of the task context and the temperature or load information in the running state; and dynamically scheduling different computational subgraphs of the lightweight model to NPU, GPU or CPU for execution based on the pre-established hardware performance portraits.
4. A method according to claim 3, wherein the dynamically selecting and applying a corresponding quantization accuracy level for the lightweight model based on the security level of the task context and temperature or load information in the operational state comprises: if the reasoning task is a safety key type and the equipment temperature is lower than a preset temperature threshold, a first quantization precision level is applied; If the reasoning task is non-critical and the system load is higher than a preset first load threshold, a second quantization precision level is applied; the first quantization accuracy level is higher than the second quantization accuracy level.
5. The method of claim 3, wherein dynamically scheduling different computational subgraphs of the lightweight model to NPU, GPU, or CPU execution based on a pre-established hardware performance representation comprises: Monitoring processor loads of the NPU, the GPU and the CPU in real time; And when the processor load of the NPU is higher than a preset second load threshold, scheduling the subsequent to-be-processed computation intensive operator or subgraph in the lightweight model to the GPU for execution.
6. The method of claim 1, wherein the detecting and multiplexing the intermediate calculation results in a continuous reasoning task comprises: calculating the feature similarity of the current input data and the historical input data in the lightweight model appointed network layer; And if the feature similarity exceeds a preset similarity threshold, multiplexing intermediate calculation results of the appointed network layer and the subsequent layers cached for the historical input data.
7. The method of claim 1, wherein the adaptively updating the lightweight model based on local feedback to perform inference tasks and on-board environmental information comprises: Monitoring the task execution error rate of the reasoning task and the change of the vehicle-mounted environment information; When the task execution error rate continuously exceeds a preset error rate threshold or the vehicle-mounted environment information changes by more than a preset amplitude, freezing parameters of a voice coding module and a visual coding module in the lightweight model; and updating the weight parameters of the top N-layer neural network of the text decoding module in the lightweight model, wherein N is a positive integer greater than or equal to 1.
8. An end-side dynamic optimization device of a vehicle-mounted multi-mode large model, which is characterized by comprising: The light-weight processing module is used for carrying out light-weight processing on the original multi-mode large model to obtain a light-weight model suitable for vehicle-mounted end side deployment; the vehicle-mounted multi-modal input data processing module is used for processing the vehicle-mounted multi-modal input data by using the lightweight model so as to execute an reasoning task; The dynamic configuration module is used for dynamically configuring the reasoning calculation resources of the lightweight model based on the task context of the reasoning task and the running state of the vehicle-mounted terminal side equipment in the process of executing the reasoning task; an intermediate calculation result generation module for performing calculation by the lightweight model based on the dynamic configuration, generating an intermediate calculation result; The detection and multiplexing module is used for detecting and multiplexing the intermediate calculation result in a continuous reasoning task; And the self-adaptive updating module is used for carrying out self-adaptive updating on the lightweight model based on local feedback of the execution reasoning task and the vehicle-mounted environment information.
9. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor to implement the method of end-side dynamic optimization of a vehicular multi-modal large model according to any one of claims 1 to 7.
10. A computer readable storage medium storing computer executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of end-side dynamic optimization of a vehicle-mounted multimodal big model according to any of claims 1 to 7.

Description

Terminal side dynamic optimization method and device of vehicle-mounted multi-mode large model and electronic equipment Technical Field The invention relates to the technical field of dynamic optimization, in particular to an end-side dynamic optimization method and device of a vehicle-mounted multi-mode large model and electronic equipment. Background At present, a multi-mode large model brings strong interaction capability for an intelligent cabin of a vehicle, but practical deployment faces the dilemma of privacy disclosure, high delay and network dependence risks existing in the process of uploading data to a cloud, and complete model deployment at a vehicle end is limited by chip calculation power and power consumption, so that real-time and high-precision reasoning is difficult to realize. The current static model compression method cannot adapt to dynamic tasks and variable environments, so that performance, efficiency and experience are difficult to consider. Disclosure of Invention In view of the above, the invention aims to provide an end-side dynamic optimization method, an end-side dynamic optimization device and electronic equipment for a vehicle-mounted multi-mode large model, which can ensure the precision of a key task and simultaneously remarkably reduce the power consumption and delay through dynamic quantization and heterogeneous calculation force scheduling, and realize efficient, real-time and robust reasoning of the end-side model by combining intermediate result multiplexing and edge self-adaptive updating. According to the method, an original multi-mode large model is subjected to light weight processing to obtain a light weight model suitable for vehicle-mounted end side deployment, vehicle-mounted multi-mode input data are processed by the light weight model to execute an reasoning task, in the process of executing the reasoning task, reasoning calculation resources of the light weight model are dynamically configured based on task context of the reasoning task and running state of vehicle-mounted end side equipment, calculation is performed through the light weight model based on the dynamic configuration to generate an intermediate calculation result, in a continuous reasoning task, detection and multiplexing are performed on the intermediate calculation result, and self-adaptive updating is performed on the light weight model based on local feedback of executing the reasoning task and vehicle-mounted environment information. In a preferred embodiment of the invention, the method for performing light-weight processing on the original multi-mode large model to obtain a light-weight model suitable for vehicle-mounted end side deployment comprises the steps of decomposing the original multi-mode large model into a voice coding module, a visual coding module and a text decoding module, performing knowledge distillation on the voice coding module and the visual coding module, mapping middle layer characteristics of the voice coding module and the visual coding module to a shared characteristic space, and replacing one or more of the voice coding module, the visual coding module and the text decoding module in the original multi-mode large model by using a light-weight network to generate the light-weight model. In a preferred embodiment of the present invention, the dynamic configuration of the inference computing resources of the lightweight model includes dynamically selecting and applying a corresponding quantization precision level for the lightweight model based on the security level of the task context and the temperature or load information in the running state, and dynamically scheduling different computation subgraphs of the lightweight model to the NPU, GPU or CPU for execution based on a pre-established hardware performance representation. In a preferred embodiment of the present invention, the above-mentioned task context-based security level and temperature or load information in an operating state dynamically selects and applies a corresponding quantization precision level for a lightweight model, including applying a first quantization precision level if the inference task is security critical and the device temperature is below a preset temperature threshold, applying a second quantization precision level if the inference task is non-critical and the system load is above a preset first load threshold, and the first quantization precision level is higher than the second quantization precision level. In a preferred embodiment of the present invention, the dynamically scheduling different computational subgraphs of the lightweight model to be executed by the NPU, GPU or CPU based on the pre-established hardware performance portrait includes monitoring processor loads of the NPU, GPU and CPU in real time, and scheduling computation intensive operators or subgraphs in the lightweight model to be processed to be executed by the GPU when the processor loads of the NPU are hig