CN-121996519-A - Heterogeneous hardware platform-oriented automatic AI algorithm model deployment method

CN121996519ACN 121996519 ACN121996519 ACN 121996519ACN-121996519-A

Abstract

The application provides an automatic AI algorithm model deployment method for heterogeneous hardware platforms, which comprises the steps of obtaining an AI algorithm model to be deployed and the heterogeneous hardware platforms, creating an AI algorithm model deployment task, setting model adaptation task parameters, constructing a model lightweight algorithm library, a rule constraint engine and an exception handling mechanism, carrying out model lightweight and unified representation based on the rule constraint engine and the exception handling mechanism to form a recommended AI algorithm model, adapting the recommended AI algorithm model to the heterogeneous hardware platforms, and carrying out performance evaluation based on a heterogeneous hardware platform test environment. According to the automatic AI algorithm model deployment method for the heterogeneous hardware platform, disclosed by the application, an AI algorithm automatic deployment flow is designed based on model optimization algorithms such as quantization, pruning and distillation, a rule constraint engine and an exception handling mechanism are constructed, so that the automatic model efficient deployment of the model is realized, and meanwhile, the performance of the deployed model is prevented from being greatly reduced.

Inventors

LIN ZHIYUAN
CHEN JIANFEI
LIU YILAN
HONG JIACHENG
ZHAO KANG
CHEN XIAOYU
YI XIAOJIE
ZHANG LEI

Assignees

启元实验室

Dates

Publication Date: 20260508
Application Date: 20260106

Claims (10)

1. An automated AI algorithm model deployment method facing heterogeneous hardware platform includes: acquiring an AI algorithm model to be deployed and a heterogeneous hardware platform; creating an AI algorithm model deployment task, and setting model adaptation task parameters; constructing a model lightweight algorithm library and a rule constraint engine, and establishing an exception handling mechanism; based on a rule constraint engine and an exception handling mechanism, carrying out model weight reduction and unified representation to form a recommended AI algorithm model; adapting the recommended AI algorithm model to the heterogeneous hardware platform; and performing performance evaluation based on the heterogeneous hardware platform test environment.
2. The automated AI algorithm model deployment method for heterogeneous hardware platforms of claim 1, wherein the model adapts task parameters further comprising an expected speed-up ratio, an algorithm accuracy degradation tolerance, and an optimization task duration.
3. The automated AI algorithm model deployment method for heterogeneous hardware platforms of claim 1, wherein the steps of constructing a model lightweight algorithm library, a rule constraint engine, and establishing an exception handling mechanism further comprise: Constructing a model lightweight algorithm library for a model pruning algorithm, a model distillation algorithm and a model quantization algorithm; designing a constraint rule base, including deploying time constraint and performance degradation constraint, and constructing a rule constraint engine; And establishing an exception handling mechanism based on the unified model representation framework.
4. The automated AI algorithm model deployment method for heterogeneous hardware platforms according to claim 1, wherein the step of performing model weight reduction and unified representation based on a rule constraint engine and an exception handling mechanism to form a recommended AI algorithm model further comprises: Inputting an AI algorithm model to be deployed under a rule constraint and exception handling mechanism; and based on a model lightweight algorithm library, sequentially executing model pruning, model distillation and model quantization on the AI algorithm model to be deployed, and outputting a onnx-format recommended AI algorithm model.
5. The automated AI algorithm model deployment method for heterogeneous hardware platforms according to claim 4, wherein the step of performing model weight reduction and unified representation based on a rule constraint engine and an exception handling mechanism to form a recommended AI algorithm model further comprises: And according to the exception handling mechanism, carrying out compatibility check on operators in the model adaptation of the AI algorithm to be deployed, carrying out dynamic optimization and adjustment replacement on the exception operators, and automatically terminating the model adaptation flow under the condition that effective handling cannot be carried out.
6. The automated AI algorithm model deployment method for heterogeneous hardware platforms according to claim 4, wherein the step of performing model weight reduction and unified representation based on a rule constraint engine and an exception handling mechanism to form a recommended AI algorithm model further comprises: Executing a model pruning algorithm on the AI algorithm model to be deployed, inputting the AI algorithm model to be deployed, outputting the pruned AI algorithm model to be deployed, and performing intermediate evaluation on the performance of the pruned AI algorithm model to be deployed; Executing a model distillation algorithm on the pruned AI algorithm model to be deployed, outputting an optimized AI algorithm model to be deployed, and performing intermediate evaluation on the performance of the AI algorithm model to be deployed; Executing a quantization algorithm on the optimized AI algorithm model to be deployed, and outputting a quantized AI algorithm model with preset precision, wherein the quantization part of the quantized AI algorithm model is constrained by deployment time and performance loss; And carrying out unified model representation, inputting a quantized AI algorithm model with preset precision, automatically executing exception processing according to the exception processing mechanism, outputting a onnx-format recommended AI algorithm model, and preparing the model to be finally adapted and deployed.
7. The automated AI algorithm model deployment method for heterogeneous hardware platforms of claim 1, wherein the step of adapting the recommended AI algorithm model to the heterogeneous hardware platform further comprises: And the heterogeneous hardware platform executes a model adaptation command to convert the onnx-format recommended AI algorithm model into a format adapted to the heterogeneous hardware platform.
8. The automated AI algorithm model deployment method for heterogeneous hardware platforms of claim 1, wherein the step of performing performance evaluation based on the heterogeneous hardware platform test environment further comprises: calculating a model reasoning acceleration ratio: , s Speed-up ratio is a model reasoning speed ratio, SP is reasoning efficiency before optimization, and SP' is reasoning efficiency after optimization; calculating the precision reduction of the model: , Wherein, L Performance loss is the model precision reduction, ACC Front part is the accuracy before optimization, and ACC Rear part (S) is the accuracy after optimization; calculating the compression rate of the model: , Wherein, D Compression ratio is the model compression rate, M Before compression is the space occupied before model compression, and M After compression is the space occupied after model compression.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the heterogeneous hardware platform oriented automated AI algorithm model deployment method of any of claims 1-8 when the computer program is executed by the processor.
10. A computer readable storage medium having a computer program stored therein, wherein the computer program when executed by a processor implements the heterogeneous hardware platform oriented automated AI algorithm model deployment method of any of claims 1-8.

Description

Heterogeneous hardware platform-oriented automatic AI algorithm model deployment method Technical Field The application relates to the field of AI algorithm models and deployment, in particular to an automatic AI algorithm model deployment method for heterogeneous hardware platforms. Background The AI algorithm model deployment is a key link for applying a trained machine learning model to an actual production environment, and relates to the technologies of model weight reduction, hardware adaptation, performance acceleration, system integration and the like. In recent years, research emphasis and development paths in the field of AI algorithm models are different at home and abroad, but all aim to improve the reasoning efficiency of the models, reduce the calculation cost and enhance the applicability. The research of the domestic AI algorithm model deployment technology is mainly focused on the aspect of model weight reduction, and the industry focuses on developing an industrialized landing technology around a deep learning framework. The development of cross-platform and cross-frame standardized tools is focused abroad, a ONNX ecosystem is provided so as to realize cross-frame compatibility of model formats, the TensorRT of NVIDIA remarkably improves GPU reasoning performance through layer fusion and dynamic tensor optimization, and the TensorFlow Serving of Google provides a high-availability solution for enterprise-level model deployment. The domestic AI algorithm model technology is strongly replaced in industry landing and localization, and foreign countries lead basic tools and leading edge research, so that a technical foundation is laid for the development and engineering landing of algorithm deployment technology. In the prior art, because the deployment of the AI algorithm model oriented to each manufacturer mainly aims at the research of developing a deployment tool chain and a deployment frame on a chip, a unified deployment flow and a framework are difficult to establish, the conventional algorithm deployment mode has low deployment efficiency, and the manual dependence is over 60 percent, so the requirement of realizing the automatic AI algorithm model deployment is very urgent. Disclosure of Invention In order to overcome the defects of the prior art, the application aims to provide an automatic AI algorithm model deployment method for heterogeneous hardware platforms, which realizes the automatic deployment of the AI algorithm model on different hardware platforms, reduces the threshold of the AI algorithm model deployment and improves the algorithm deployment efficiency. In order to achieve the above object, the method for deploying an automated AI algorithm model for a heterogeneous hardware platform provided by the present application includes: acquiring an AI algorithm model to be deployed and a heterogeneous hardware platform; creating an AI algorithm model deployment task, and setting model adaptation task parameters; constructing a model lightweight algorithm library and a rule constraint engine, and establishing an exception handling mechanism; based on a rule constraint engine and an exception handling mechanism, carrying out model weight reduction and unified representation to form a recommended AI algorithm model; adapting the recommended AI algorithm model to the heterogeneous hardware platform; and performing performance evaluation based on the heterogeneous hardware platform test environment. Further, the model adapts task parameters and further comprises expected speed-up ratios, algorithm accuracy degradation tolerance and optimization task duration. Further, the steps of constructing a model lightweight algorithm library, a rule constraint engine and establishing an exception handling mechanism further comprise: Constructing a model lightweight algorithm library for a model pruning algorithm, a model distillation algorithm and a model quantization algorithm; designing a constraint rule base, including deploying time constraint and performance degradation constraint, and constructing a rule constraint engine; And establishing an exception handling mechanism based on the unified model representation framework. Further, the step of performing model weight reduction and unified representation based on the rule constraint engine and the exception handling mechanism to form a recommended AI algorithm model further comprises: Inputting an AI algorithm model to be deployed under a rule constraint and exception handling mechanism; and based on a model lightweight algorithm library, sequentially executing model pruning, model distillation and model quantization on the AI algorithm model to be deployed, and outputting a onnx-format recommended AI algorithm model. Further, the step of performing model weight reduction and unified representation based on the rule constraint engine and the exception handling mechanism to form a recommended AI algorithm model further comprises: And acco