CN-122021744-A - Computing model deployment method and electronic equipment
Abstract
The application provides a computing model deployment method and electronic equipment, wherein the method comprises the steps of respectively determining target quantization precision corresponding to each network layer in an original model, determining the network layer comprising at least two continuous operators as a target network layer, quantizing each network layer of the original model based on the target quantization precision, combining at least two continuous operators of each target network layer to execute a fused target fusion operator to obtain a target model, determining target processing cores corresponding to each network layer from a plurality of processing cores of a target deployment platform based on the target quantization precision and calculated amount of each network layer of the target model, and loading the target model and the computing task allocation information to the target deployment platform. According to the computing model deployment method and the electronic device, the target model can be used for remarkably improving the reasoning throughput rate and the resource utilization rate while maintaining the detection precision, and the reasoning delay of the edge device is effectively reduced.
Inventors
- WANG YUNTAO
- Zhao Huidang
- LU KANG
- Kuang Yiling
- XU PING
- SHI WENHUA
Assignees
- 中国卫通集团股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251211
Claims (10)
- 1. A method of computing model deployment, comprising: Respectively determining target quantization precision corresponding to each network layer in the original model, and determining the network layer containing at least two continuous operators as a target network layer; Quantizing each network layer of the original model based on the target quantization precision, and combining a target fusion operator obtained by performing fusion on the at least two continuous operators of each target network layer to obtain a target model; determining target processing cores corresponding to each network layer from a plurality of processing cores of a target deployment platform based on target quantization precision and calculated amount of each network layer of the target model, and obtaining calculation task allocation information; and loading the target model and the computing task allocation information to the target deployment platform.
- 2. The method for deploying a computing model according to claim 1, wherein the determining the target quantization accuracy corresponding to each network layer in the original model respectively comprises: Aiming at the network layers containing convolution operation in the original model, respectively calculating the layer entropy values of each network layer, and determining the target quantization precision based on the corresponding layer entropy values; Aiming at a network layer which does not contain convolution operation in the original model, determining the basic quantization precision as the corresponding target quantization precision; wherein, the layer entropy value is positively correlated with the target quantization accuracy.
- 3. The method for deploying computing models according to claim 2, wherein the calculating the layer entropy values of the network layers respectively comprises: aiming at each network layer containing convolution operation, obtaining an output characteristic diagram of the network layer; and calculating the entropy value of each channel in the network layer based on the corresponding output characteristic diagram, and determining the average value of the entropy values of all channels in the network layer as the layer entropy value of the network layer.
- 4. The method of claim 1, wherein the at least two successive operators comprise an original convolution operator and a batch normalization operator, and the target fusion operator is obtained by: Extracting original convolution parameters in original convolution operators and batch normalization parameters of batch normalization operators of the original convolution operators aiming at each target network layer; Generating new convolution parameters based on the corresponding convolution parameters and the corresponding batch normalization parameters; Substituting the new convolution parameters into the original convolution operator instead of the original convolution parameters to obtain the target fusion operator.
- 5. The method for deploying a computing model according to claim 1, wherein the at least two consecutive operators comprise an original convolution operator and an activation operator, and the target fusion operator is obtained by a method comprising: And for each target network layer, merging the calculation nodes of the original convolution operator and the calculation nodes of the activation operator into one calculation node to obtain the target fusion operator.
- 6. The method for deploying a computing model according to claim 1, wherein the at least two consecutive operators comprise an original convolution operator, a batch normalization operator and an activation operator, and the target fusion operator is obtained by: Extracting original convolution parameters in original convolution operators and batch normalization parameters of batch normalization operators of the original convolution operators aiming at each target network layer; Generating new convolution parameters based on the corresponding convolution parameters and the corresponding batch normalization parameters; substituting the new convolution parameters into the original convolution operator instead of the original convolution parameters to obtain a target convolution operator; and merging the calculation nodes of the target convolution operator and the calculation nodes of the activation operator into one calculation node to obtain the target fusion operator.
- 7. The method for deploying a computing model according to claim 1, wherein determining, from a plurality of processing cores of a target deployment platform, a target processing core corresponding to each network layer based on the target quantization precision and the calculated amount of each network layer of the target model, to obtain computing task allocation information comprises: determining a processing core with highest computing capacity in a plurality of processing cores of the target deployment platform as a first processing core; determining the first processing core as a target processing core of a network layer with target quantization precision higher than preset precision; And aiming at the rest network layers with the target quantization precision lower than or equal to the preset precision, taking the difference value between the total computation amounts respectively corresponding to the processing cores as an allocation target, and determining the corresponding target processing cores from the processing cores to obtain the computation task allocation information.
- 8. The method for deploying a computing model according to claim 1, wherein the computation amount of each network layer of the target model is calculated by the following method, comprising: And determining the corresponding calculated amount based on the output feature diagram size, the channel number and the convolution kernel size of the corresponding network layer in response to the network layer of the target model being a convolution layer.
- 9. The computing model deployment method of claim 1, further comprising: Constructing a dynamic task migration strategy based on the computing task allocation information, and loading the dynamic task migration strategy to the target deployment platform; when the target deployment platform runs the target model, the real-time load rate of at least one processing core in the plurality of processing cores reaches a second preset threshold value, and partial computing tasks in the corresponding processing cores are migrated to other processing cores.
- 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the computing model deployment method of any of claims 1 to 9 when the program is executed by the processor.
Description
Computing model deployment method and electronic equipment Technical Field The present application relates to the field of model deployment technologies, and in particular, to a computing model deployment method and an electronic device. Background In the existing model deployment process, model quantization stays at the level of unified precision quantization and static parameter configuration, quantization bit width cannot be adaptively adjusted according to feature complexity and the like of different network layers, model reasoning precision is low, meanwhile, the problems of low operator execution efficiency, large inter-core load difference and the like exist, so that model reasoning delay is high, time delay is high, and the requirement of edge side equipment on instantaneity is difficult to meet. Disclosure of Invention In view of the above, the present application aims to provide a computing model deployment method and an electronic device. Based on the above object, the present application provides a computing model deployment method, comprising: Respectively determining target quantization precision corresponding to each network layer in the original model, and determining the network layer containing at least two continuous operators as a target network layer; Quantizing each network layer of the original model based on the target quantization precision, and combining a target fusion operator obtained by performing fusion on the at least two continuous operators of each target network layer to obtain a target model; determining target processing cores corresponding to each network layer from a plurality of processing cores of a target deployment platform based on target quantization precision and calculated amount of each network layer of the target model, and obtaining calculation task allocation information; and loading the target model and the computing task allocation information to the target deployment platform. Optionally, the determining the target quantization precision corresponding to each network layer in the original model includes: Aiming at the network layers containing convolution operation in the original model, respectively calculating the layer entropy values of each network layer, and determining the target quantization precision based on the corresponding layer entropy values; Aiming at a network layer which does not contain convolution operation in the original model, determining the basic quantization precision as the corresponding target quantization precision; wherein, the layer entropy value is positively correlated with the target quantization accuracy. Optionally, the calculating the layer entropy value of each network layer includes: aiming at each network layer containing convolution operation, obtaining an output characteristic diagram of the network layer; and calculating the entropy value of each channel in the network layer based on the corresponding output characteristic diagram, and determining the average value of the entropy values of all channels in the network layer as the layer entropy value of the network layer. Optionally, the at least two continuous operators include an original convolution operator and a batch normalization operator, and the target fusion operator is obtained by the following method, including: Extracting original convolution parameters in original convolution operators and batch normalization parameters of batch normalization operators of the original convolution operators aiming at each target network layer; Generating new convolution parameters based on the corresponding convolution parameters and the corresponding batch normalization parameters; Substituting the new convolution parameters into the original convolution operator instead of the original convolution parameters to obtain the target fusion operator. Optionally, the at least two continuous operators include an original convolution operator and an activation operator, and the target fusion operator is obtained by the following method, including: And for each target network layer, merging the calculation nodes of the original convolution operator and the calculation nodes of the activation operator into one calculation node to obtain the target fusion operator. Optionally, the at least two continuous operators include an original convolution operator, a batch normalization operator and an activation operator, and the target fusion operator is obtained by the following method, including: Extracting original convolution parameters in original convolution operators and batch normalization parameters of batch normalization operators of the original convolution operators aiming at each target network layer; Generating new convolution parameters based on the corresponding convolution parameters and the corresponding batch normalization parameters; substituting the new convolution parameters into the original convolution operator instead of the original convolution parameters to