CN-121981177-A - Underwater target recognition convolutional neural network acceleration method based on FPGA
Abstract
The invention discloses an FPGA-based underwater target recognition convolutional neural network acceleration method, which aims to solve the problems of computationally intensive and memory limited convolutional neural networks in underwater edge equipment. And designing a lightweight network consisting of three layers of depth separable convolutions and two layers of full-connection layers, realizing parameter lightweight and converting the lightweight network into a half-precision floating point format through a batch normalization parameter fusion formula, designing an acceleration circuit for depth convolution block calculation and full-connection layer merging IP core encapsulation based on FPGA, and improving the calculation efficiency by combining parallel strategies such as cyclic expansion, assembly line optimization and the like. The innovation point to be protected is that the FPGA realizes the circuit design strategy of convolution layer block calculation and full connection layer combination and the parallel optimization strategy combination of cyclic expansion and array segmentation, and realizes the low-power consumption real-time reasoning of underwater target identification through hardware algorithm collaborative design.
Inventors
- Luo Yuexuan
- SU ZONGSHUAI
- YU MINGHE
- LIU JIAXI
Assignees
- 沈阳航天新光集团有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251210
Claims (8)
- 1. An underwater target recognition convolutional neural network acceleration method based on an FPGA is characterized by comprising the following steps: Step S1, designing a lightweight underwater target recognition network, namely constructing a lightweight convolutional neural network formed by sequentially connecting three depth separable convolutional layers and two full-connection layers; S2, parameter light-weight processing, namely carrying out batch normalization layer parameter fusion and half-precision floating point conversion processing on parameters of the lightweight convolutional neural network; S3, designing an FPGA acceleration circuit, namely designing a corresponding convolution layer circuit, a pooling layer circuit, a full-connection layer circuit and a system interface on the FPGA based on a network model subjected to parameter light-weight treatment; And S4, circuit parallel optimization, namely performing cyclic expansion, assembly line, nested cyclic flattening and array segmentation optimization on the FPGA accelerating circuit so as to improve the calculation parallelism and efficiency.
- 2. The method for accelerating an underwater target recognition convolutional neural network based on the FPGA according to claim 1, wherein in the step S1, the specific structure of the lightweight convolutional neural network is as follows: a first depth separable convolution layer, the convolution kernel of which has a size of 9 and a step length of 1, followed by a4 x 4 maximum pooling layer; A second depth separable convolution layer, the convolution kernel of which has a size of 5 and a step length of 1, and is followed by a maximum pooling layer of 3×3; A third depth separable convolution layer, the convolution kernel size of which is 6, and the step length of which is 1; A first fully-connected layer with an output dimension of 120; the second full-connection layer has an output dimension of 3; and after each layer of the depth separable convolution layer and the full connection layer, performing batch normalization operation and accessing an activation function.
- 3. The method for accelerating underwater target recognition convolutional neural network based on FPGA as claimed in claim 1, wherein in step S2, the batch normalization layer parameter fusion is to merge parameters of batch normalization layers into weights and offsets of a previous convolutional layer or a full-connection layer thereof through linear transformation, and the fused weight parameters Bias parameter Calculated by the following formula: Wherein gamma is a parameter scaling factor of the batch normalization layer, beta is an offset, mu is a mean value, As a function of the variance of the values, Is a numerical stability constant.
- 4. The method according to claim 1, wherein in the step S2, the half-precision floating point conversion is to convert the data types of all weights and bias parameters in the network from 32-bit single-precision floating point numbers to 16-bit half-precision floating point numbers.
- 5. The method for accelerating the underwater target recognition convolutional neural network based on the FPGA is characterized in that in the step S3, the convolutional layer circuit is realized based on depth separable convolution and comprises a depth convolutional circuit which independently performs multiply-accumulate operation according to channels and a point-by-point convolutional circuit which performs channel information fusion by adopting a 1X1 convolutional kernel, the fully connected layer circuit performs blocking according to input/output size, loads corresponding weight blocks for multiply-accumulate calculation, and combines and encapsulates the two layers of fully connected layers into a single IP kernel.
- 6. The method according to claim 5, wherein in the step S3, the system interface is a standardized interface based on AXI bus, wherein the input/output data port is configured as an AXI master interface, and the scalar parameter data port is configured as an AXI-Lite slave interface.
- 7. The method for accelerating an underwater target recognition convolutional neural network based on an FPGA according to claim 1, wherein in the step S4, the loop expansion optimization is to perform parallelization reconstruction on a calculation loop, and copy a single iterative circuit into a plurality of groups of parallel units to realize complete expansion of a loop body.
- 8. The method according to claim 1, wherein in step S4, the array segmentation optimization is to segment the data array by a block segmentation, a loop segmentation or a full segmentation strategy.
Description
Underwater target recognition convolutional neural network acceleration method based on FPGA Technical Field The invention relates to the technical field of embedded artificial intelligence and hardware acceleration, in particular to an underwater target recognition convolutional neural network acceleration method based on an FPGA. Background The underwater target identification faces multiple challenges such as limited communication bandwidth, complex image noise, severe edge equipment resources and the like, and the convolutional neural network can be used for better identifying images of complex underwater environments, but the convolutional neural network algorithm has the characteristics of intensive storage and intensive computation, and the conventional convolutional neural network based on the CPU/GPU is difficult to realize in an underwater edge terminal with limited memory and computational resources due to the limitation of space, power consumption and the like. Therefore, the method for accelerating the convolutional neural network by adopting the FPGA with low power consumption and high flexibility is adopted to realize the underwater target recognition, the problem that the accuracy and the speed are difficult to be compatible when the underwater target recognition network is used for edge calculation can be solved, the efficiency of the underwater target recognition is improved, and the application and the development of the related technology in the fields of underwater detection, resource development and the like are promoted. In the prior art, CN113393376B discloses a lightweight super-resolution image reconstruction method based on deep learning, wherein a classical FSRCNN network model is optimized mainly through hardware and used for super-resolution image reconstruction, so that the reconstruction speed of a small terminal image is improved while the quality of a reconstructed image is ensured. The acceleration is mainly performed using a systolic array and the patent is mainly used for image reconstruction and not for image classification and subsequent motor control. In the second prior art, CN120011050B discloses a method and a system for scheduling FPGA cluster resources facing a convolutional neural network, and the method and the system for scheduling FPGA cluster resources are used for optimizing the implementation efficiency of the convolutional neural network on the FPGA cluster by converting a convolutional neural network model into a high-level comprehensive code, analyzing reuse coefficients and partitioning the code, and adding input and output interfaces between sub-modules for partitioning. The patent mainly splits and packages the network to accelerate the execution of a convolutional neural network model by using a plurality of IP cores, and mainly splits at different network hierarchy levels. Disclosure of Invention The invention provides an acceleration method of an underwater target recognition convolutional neural network based on an FPGA. According to the method, the network structure is light, the parameter processing optimization and the depth cooperation of the FPGA bottom layer circuit parallelization design are realized, so that the reasoning speed is obviously improved and the power consumption is reduced on the premise of ensuring the identification accuracy. The technical scheme adopted by the invention is an underwater target recognition convolutional neural network acceleration method based on FPGA, comprising the following steps: Step S1, designing a lightweight underwater target recognition network, namely constructing a lightweight convolutional neural network formed by sequentially connecting three depth separable convolutional layers and two full-connection layers; S2, parameter light-weight processing, namely carrying out batch normalization layer parameter fusion and half-precision floating point conversion processing on parameters of the lightweight convolutional neural network; S3, designing an FPGA acceleration circuit, namely designing a corresponding convolution layer circuit, a pooling layer circuit, a full-connection layer circuit and a system interface on the FPGA based on a network model subjected to parameter light-weight treatment; And S4, circuit parallel optimization, namely performing cyclic expansion, assembly line, nested cyclic flattening and array segmentation optimization on the FPGA accelerating circuit so as to improve the calculation parallelism and efficiency. Preferably, in the step S1, the specific structure of the lightweight convolutional neural network is as follows: a first depth separable convolution layer, the convolution kernel of which has a size of 9 and a step length of 1, followed by a4 x 4 maximum pooling layer; A second depth separable convolution layer, the convolution kernel of which has a size of 5 and a step length of 1, and is followed by a maximum pooling layer of 3×3; A third depth separable convolution