CN-122023829-A - FPGA-based configurable multistage parallel image feature recognition method and system
Abstract
The invention discloses a method and a system for identifying image features based on FPGA (field programmable gate array) configurable multilevel parallelism, which relate to the technical field of embedded vision and have the technical scheme that the method and the system are characterized in that image identification task parameters are received, and a configuration instruction is generated by analysis; according to the configuration instruction, an optimal parallel mode is selected through a built-in resource delay evaluation model, a control signal is generated, an input image data stream is dynamically decomposed into data blocks adapting to the optimal parallel mode according to the control signal, a parallel computing unit in a multistage parallel convolution engine is activated, and feature map data are output after convolution operation is performed. According to the invention, the calculation delay and the hardware resource consumption can be dynamically balanced in the side environment, so that the same hardware design can adaptively select the optimal parallel strategy according to the actual task demand and the resource condition, the resource utilization rate is maximized on the premise of ensuring the real-time performance, and the adaptability and the practicability of the vision system under the condition of limited side calculation force are obviously improved.
Inventors
- LIU YU
- ZHANG NANXIN
- WU HUAIGU
Assignees
- 天府绛溪实验室
Dates
- Publication Date
- 20260512
- Application Date
- 20260407
Claims (10)
- 1. The image feature recognition method based on the FPGA is characterized by comprising the following steps of: Receiving image recognition task parameters, and analyzing and generating configuration instructions containing user demand parameters and environment deployment parameters; According to the configuration instruction, selecting an optimal parallel mode through a built-in resource delay evaluation model, and generating a control signal containing data format parameters; Dynamically decomposing an input image data stream into data blocks adapting to the optimal parallel mode according to the control signals, performing channel and space alignment preprocessing, and establishing a data bus interface matched with a computing unit; And activating parallel computing units with corresponding computing granularity in a multistage parallel convolution engine, and outputting feature map data after performing convolution operation, wherein the multistage parallel convolution engine comprises at least two parallel computing units with different computing granularity in a channel level, a pixel level, a line channel level and a line pixel level.
- 2. The FPGA-based configurable multi-level parallel image feature recognition method of claim 1, wherein the resource delay evaluation model is configured to: generating a plurality of groups of candidate parallel modes by taking the user demand parameters as optimization targets; Filtering candidate modes exceeding hardware capacity by taking the environment deployment parameters as constraint boundaries; And finally outputting the comprehensive optimal parallel mode configuration parameters under the constraint condition.
- 3. The method for identifying image features based on configurable multistage parallelism of an FPGA according to claim 2, wherein the final output satisfies comprehensive optimal parallel mode configuration parameters under constraint conditions, specifically: under the condition that the target delay constraint is met, selecting an optimal parallel mode by taking the minimum resource consumption as a target; or, under a given resource power consumption budget, an optimal parallel mode is selected with the goal of lowest latency.
- 4. The FPGA-based configurable multi-level parallel image feature recognition method of claim 1, wherein the user demand parameters include convolution kernel parameters, target delay constraints, and accuracy requirements; And/or, the environment deployment parameters include the number of available DSPs, BRAM capacity and memory bandwidth.
- 5. The method for identifying image features based on configurable multistage parallelism of an FPGA according to claim 1, wherein the construction process of the resource delay evaluation model specifically comprises: Based on the historical task parameter library and actual hardware test data, regression analysis or neural network training is adopted to obtain a mapping relation between resource consumption and delay; The model inputs comprise convolution kernel size, input/output channel number, step length and filling parameters, and outputs are DSP utilization rate, BRAM occupancy rate and estimated processing period number.
- 6. The FPGA-based configurable multi-level parallel image feature recognition method of claim 1, wherein the channel-level parallel computing unit is configured to compute a dot product operation of a single input channel in a single convolution window in the same clock cycle; the pixel-level parallel computing unit is configured to compute the dot multiplication operation of all input channels in a single convolution window in parallel in the same clock period; The parallel computing unit of the line channel level is configured to compute the dot multiplication operation of one input channel in all convolution windows in one line in parallel in the same clock period; the parallel computing unit of the row pixel level is configured to compute the dot multiplication operation of all input channels in all convolution windows in one row in parallel in the same clock period; And each parallel computing unit shares a basic multiply-add circuit, and realizes computation path switching through a multiplexer.
- 7. The FPGA-based configurable multi-stage parallel image feature recognition method of claim 1, wherein the multi-stage parallel convolution engine further comprises: The full parallel unit is configured to complete full channel parallel operation of the whole image in a single clock period; The full parallel unit directly accesses the external DDR memory through an FPGA on-chip bus to support oversized image processing.
- 8. The image feature recognition system based on the FPGA is configurable in multistage parallelism and is characterized by comprising: The task analysis module is used for receiving the image recognition task parameters and analyzing and generating configuration instructions containing user demand parameters and environment deployment parameters; the dynamic configuration module is connected with the task analysis module, is internally provided with a resource delay evaluation model, and is used for selecting an optimal parallel mode after receiving the configuration instruction and generating a control signal containing data format parameters; The data analysis module is connected with the dynamic configuration module and is used for dynamically decomposing an input image data stream into data blocks adapting to the optimal parallel mode according to the control signal, carrying out channel and space alignment preprocessing and establishing a data bus interface matched with the computing unit; the multistage parallel convolution engine is respectively connected with the data analysis module and the dynamic configuration module, integrates at least two parallel computing units with different computing granularities in a channel stage, a pixel stage, a line channel stage and a line pixel stage, and executes convolution operation on an input data block according to an activation state by the parallel computing units to output feature map data.
- 9. The FPGA-based configurable multi-level parallel image feature recognition system of claim 8, wherein the task parsing module is deployed on a general purpose processor; The multistage parallel convolution engine, the dynamic configuration module and the data analysis module are deployed in the FPGA; And the general processor and the FPGA perform configuration parameter transmission and calculation result interaction through a PCIe interface.
- 10. The FPGA-based configurable multi-level parallel image feature recognition system of claim 8, wherein the task parsing module and the dynamic configuration module are deployed on a general purpose processor; The multistage parallel convolution engine and the data analysis module are deployed in the FPGA; and the general processor and the FPGA realize cooperative control of data flow through an AXI-Stream interface.
Description
FPGA-based configurable multistage parallel image feature recognition method and system Technical Field The invention relates to the technical field of embedded vision, in particular to a method and a system for recognizing image features based on FPGA configurable multilevel parallelism. Background The current image recognition accelerated by artificial intelligent hardware mainly depends on the strong centralization of the back end, is easily influenced by network fluctuation and back end task arrangement, and is not suitable for the performance and real-time requirements of the terminal side under the limitation of computing power. Meanwhile, the Convolutional Neural Network (CNN) deployed on the existing equipment for real-time image recognition mainly depends on a general purpose processor (CPU), a Graphic Processor (GPU) or an Application Specific Integrated Circuit (ASIC) with a fixed architecture and a general purpose processor (CPU) architecture. However, the above scheme has the inherent defects of 1) stiff architecture and poor adaptability, that the existing scheme cannot be dynamically adjusted according to the performance requirement (such as network delay) with real-time change, and cannot adjust the network model structure (such as convolution kernel parameters and channel number) according to the condition of input data, so that one set of design is difficult to adapt to diversified end-side application and product iteration requirements. 2) The existing architecture generally adopts a single fixed calculation mode (such as full-parameter calculation or sliding serial), has low full-parameter calculation delay and huge consumption resources, is far-ultralow in cost FPGA hardware acceleration architecture, has low resource occupation but long delay in serial or low-parallelism mode, has high cost and large volume under the same performance condition, cannot meet the use condition of an end side, and cannot meet the real-time requirement of scenes such as automatic driving and the like. 3) The development and deployment complexity is high, the special hardware acceleration IP is redesigned, verified and deployed for different performance requirements or different chip platforms, the period is long, the cost is high, and the technology is prevented from falling to the ground rapidly. 4) The system energy efficiency ratio is low, namely the fixed architecture has 'performance excess' when the task is simple, and has 'performance deficiency' when the task is complex, so that the utilization rate of hardware computing resources is low, and the optimal energy efficiency ratio can not be realized under the dynamic requirement. Therefore, how to research and design a method and a system for identifying image features based on FPGA configurable multilevel parallelism, which can overcome the defects, is a problem which needs to be solved at present. Disclosure of Invention In order to solve the defects in the prior art, the invention aims to provide the FPGA-based configurable multistage parallel image feature recognition method and system, which can dynamically balance calculation delay and moderate hardware resource consumption in a side-end environment, so that the same hardware design can adaptively select an optimal parallel strategy according to actual task demands and resource conditions, thereby maximizing the resource utilization rate on the premise of ensuring real-time performance and remarkably improving the adaptability and practicability of a vision system under the condition of limited side computing power. The technical aim of the invention is realized by the following technical scheme: in a first aspect, an FPGA-based configurable multi-level parallel image feature recognition method is provided, including the steps of: Receiving image recognition task parameters, and analyzing and generating configuration instructions containing user demand parameters and environment deployment parameters; According to the configuration instruction, selecting an optimal parallel mode through a built-in resource delay evaluation model, and generating a control signal containing data format parameters; Dynamically decomposing an input image data stream into data blocks adapting to the optimal parallel mode according to the control signals, performing channel and space alignment preprocessing, and establishing a data bus interface matched with a computing unit; And activating parallel computing units with corresponding computing granularity in a multistage parallel convolution engine, and outputting feature map data after performing convolution operation, wherein the multistage parallel convolution engine comprises at least two parallel computing units with different computing granularity in a channel level, a pixel level, a line channel level and a line pixel level. Further, the resource delay evaluation model is configured to: generating a plurality of groups of candidate parallel modes by taking the