CN-122021747-A - Digital drainage twin large model light weight method and device

CN122021747ACN 122021747 ACN122021747 ACN 122021747ACN-122021747-A

Abstract

The invention discloses a digital drainage twin big model light-weight quantification method and device, wherein the method comprises the steps of loading an original model, configuring mixed quantification parameters to quantify the original model to obtain a quantification model, preprocessing time-space sequence data based on mean value and variance parameters, carrying out distillation training on the quantification model according to the preprocessed time-space sequence data, carrying out operator replacement and reasoning path optimization on the distilled quantification model, converting the operator replacement and reasoning path optimization into a preset format, and carrying out accuracy and performance verification. The innovation provides a triple optimization strategy of mixed quantization, knowledge distillation and space-time calibration, realizes the technical breakthrough of 75% of model volume compression and less than or equal to 3% of precision loss, and reduces the prediction error of hydrological parameters by 40% compared with the traditional single quantization method.

Inventors

WANG HAO
ZHANG DEQUAN
WANG HAO

Assignees

上海中井汉鼎数字技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260209

Claims (10)

1. The lightweight quantization method for the digital drainage twin large model is characterized by comprising the following steps of: Loading an original model; Configuring mixed quantization parameters to quantize the original model to obtain a quantized model; preprocessing the time-space sequence data based on the mean value and the variance parameters; distilling and training the quantization model according to the preprocessed space-time sequence data; performing operator replacement and reasoning path optimization on the distilled quantization model, converting the quantization model into a preset format, and And (5) performing accuracy and performance verification.
2. The method of claim 1, wherein the mixed quantization parameter comprises an INT4 weight quantization and an INT8 activation value quantization.
3. The method of claim 1, wherein the spatio-temporal sequence data includes rain and water level data.
4. The method of claim 1, further comprising optimizing the quantization threshold by the formula: ; Wherein P is the original model output distribution, Q is the quantized model output distribution, and distribution alignment is realized by minimizing D KL .
5. The method of claim 1, wherein the distillation training of the quantization model from the preprocessed spatio-temporal sequence data comprises: Taking an original model as a teacher model, taking a quantized model as a student model, taking output of the teacher model as a soft tag, taking space-time sequence data as a hard tag, and carrying out distillation training on the quantized student model, wherein a loss function formula is as follows: ; wherein L hard is hard tag cross entropy loss, L soft is soft tag KL divergence loss, alpha is a weight coefficient, and the value is 0.3.
6. The method of claim 1, wherein the predetermined format is a hong-mo compatible HIM/OM model format.
7. The method of claim 1, wherein said performing accuracy and performance verification comprises: Comparing the hydrological parameter prediction precision of the models before and after quantization; and testing the memory occupation, reasoning time delay and power consumption of the model at the edge test terminal.
8. The utility model provides a digital drainage twin big model lightweight device which characterized in that includes: a model loading module configured to read a teacher model and a student model; a hybrid quantization module configured to implement a parameter configuration of the INT4 weight quantization and the INT8 activation value quantization; A space-time calibration module configured to pre-process the space-time sequence data based on a preset mean, variance, and optimize a quantization threshold by an algorithm; a knowledge distillation module configured to perform distillation training to train a student model with weight loss of hard and soft tags; a hong adaptation module configured to perform operator substitution and path optimization on the quantization model, generating a hong-compatible HIM/OM model; a precision verification module configured to compare the model precision before and after quantization with the performance index, and And a device integration module configured to integrate the module functions described above to form an integrated quantization tool.
9. The apparatus of claim 8, wherein the hardware environment in which the apparatus operates comprises a quantization server and an edge test terminal; the quantization server is provided with a 16-core processor of the spread spectrum 920, a rising 310B chip and a 32GB memory; the edge test terminal is configured with a Haishi 3516DV300 chip and a rising NPU@0.5TOPS computing power.
10. The apparatus of claim 8, wherein the software environment in which the apparatus operates comprises: the quantization server is loaded with a LinuxCentOS7.0 operating system, mindSporLite2.2, CANN 7.0.0, ONNXRuntime1.15.0 and Python3.10; The edge test terminal carries a HongMong 6.0 embedded version operating system.

Description

Digital drainage twin large model light weight method and device Technical Field The invention relates to the technical field of digital twin and artificial intelligence model weight reduction, in particular to a digital drainage twin large model weight reduction method and device. Background The urban drainage twin large model is used as a core carrier of a digital drainage system, the model parameter scale of the model is generally up to the level of billions, and key technical support can be provided for high-precision services such as urban waterlogging prediction, drainage pipe network water level monitoring and the like. However, when the large model is directly deployed on edge embedded equipment such as an underground terminal, a gate station and the like, the large model is limited by the intrinsic endowment of hardware resources of the edge equipment, three core technical bottlenecks of overhigh memory occupation, overlarge reasoning response time delay and overlarge equipment operation power consumption generally exist, and the large model becomes a primary obstacle for the application of the model on the edge side in landing. Aiming at the difficult problem of model deployment, the prior art adopts a single INT8 weight quantization scheme to compress the model volume, but the technical scheme has the obvious limitations that firstly, the model precision loss amplitude after being quantized by INT8 exceeds 5 percent, the severe requirements of high-precision services such as waterlogging prediction, water level monitoring and the like on model prediction precision are difficult to be met, secondly, the quantization scheme does not develop targeted optimization aiming at the distribution characteristics of air sequence data such as rainfall and water level which are specific to a drainage twin model, so that the hydrological parameter prediction error of the quantized model is obviously increased, and the practical application value of the model is seriously reduced. Due to the lack of a special light-weight optimization scheme for adapting the data characteristics and business requirements of the urban drainage twin large model, the defects of the quantization technology are further amplified, so that the drainage twin large model with the trillion-level parameter scale is difficult to realize high-efficiency landing at the edge terminal with limited resources. Meanwhile, the traditional quantized and optimized drainage twin model is poor in compatibility with a hong Monte operating system, the hardware computational power advantage of the lifting NPU cannot be fully invoked, the reasoning efficiency of the model at the edge side cannot be effectively improved, and finally the large-scale deployment, popularization and application of the digital drainage twin model at the edge side are restricted. In summary, there is a need for an edge lightweight scheme of an urban drainage twin large model, which combines model accuracy, hardware suitability and system compatibility, so as to break through the bottleneck of the prior art and promote the large-scale landing of the digital drainage twin large model on the edge side. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a light-weight quantification method and device for a digital drainage twin big model, and aims at: the combined optimization strategy of weight quantization, activation value quantization and knowledge distillation is provided, and the precision loss is ensured to be less than or equal to 3% when the model volume is compressed by more than 75%; Designing a quantization calibration algorithm aiming at the space-time characteristics of the drainage twin model, and reducing the prediction error of hydrological parameters; Develop a quantization device compatible with the Hongmon system to realize the deep adaptation of the quantization model and the lifting NPU. The invention discloses a lightweight quantization method for a digital drainage twin big model, which comprises the following steps: Loading an original model; Configuring mixed quantization parameters to quantize the original model to obtain a quantized model; preprocessing the time-space sequence data based on the mean value and the variance parameters; distilling and training the quantization model according to the preprocessed space-time sequence data; performing operator replacement and reasoning path optimization on the distilled quantization model, converting the quantization model into a preset format, and And (5) performing accuracy and performance verification. In one embodiment of the present invention, the mixed quantization parameter includes INT4 weight quantization and INT8 activation value quantization. In one embodiment of the invention, the spatio-temporal sequence data includes rain and water level data. In one embodiment of the present invention, optimizing the quantization threshold by the following formula is further in