CN-122020687-A - Hardware accelerator
Abstract
The invention discloses a hardware accelerator, which relates to the technical field of integrated circuits and hardware acceleration and comprises a register configuration module, a buffer module and a modular multiplication calculation module, wherein the register configuration module is used for receiving configuration parameters sent by a processor chip and generating calculation starting signals, the buffer module is used for receiving and buffering target operation data input by the processor chip based on the configuration parameters, the modular multiplication calculation module is respectively connected with the register configuration module and the buffer module and is used for splitting the target operation data into a plurality of data segments according to the configuration parameters and carrying out Montgomery modular multiplication operation on each data segment based on a pipeline structure comprising a main control unit and a plurality of slave calculation units, wherein the slave calculation units are arranged in a cascade manner and process different data segments in parallel under the dispatching of the main control unit. And the working frequency of the accelerator is improved in the large number operation process, and the operation throughput rate is improved.
Inventors
- SHEN YI
- SU DANDAN
- ZHANG CUNSHENG
- LIU QIHAO
Assignees
- 济南迈威智能科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260128
Claims (10)
- 1. A hardware accelerator is characterized by being applied to a processor chip and comprising a register configuration module, a cache module and a modular multiplication calculation module, wherein, The register configuration module is used for receiving the configuration parameters sent by the processor chip and generating a calculation starting signal; The caching module is used for receiving and caching target operation data input by the processor chip based on the configuration parameters; The modular multiplication calculation module is respectively connected with the register configuration module and the cache module, and is used for splitting the target operation data into a plurality of data segments according to the configuration parameters and carrying out Montgomery modular multiplication operation on each data segment based on a pipeline structure comprising a master control unit and a plurality of slave calculation units; the slave computing units are arranged in cascade, and different data segments are processed in parallel under the scheduling of the master control unit.
- 2. The hardware accelerator of claim 1, wherein the cache module comprises: The verification unit is used for performing condition verification processing on the initial operation data input by the processor chip based on the total bit width parameter in the configuration parameters so as to obtain a corresponding data verification result; and the first data storage unit is used for storing the initial operation data as target operation data if the data verification result passes the condition verification.
- 3. The hardware accelerator of claim 2, wherein the cache module comprises: and the signal sending unit is used for sending an error state signal to the register configuration module if the data check result fails the condition check, so that the register configuration module generates and sends an error interrupt signal to the processor chip based on the error state signal.
- 4. The hardware accelerator of claim 2, wherein the modular multiplication computation module comprises: the data splitting unit is used for splitting the first operand, the second operand and the third operand in the target operation data according to the split data bit width parameter in the configuration parameter so as to obtain a plurality of first data segments, second data segments and third data segments; The first data segment is a preset number of multiplicand data segments processed according to a preset operation sequence, and the second data segment and the third data segment are a preset number of multiplier data segments and modulus data segments processed in parallel.
- 5. The hardware accelerator of claim 4, wherein the modular multiplication computation module comprises: the main control unit is used for scheduling each first data segment according to the preset operation sequence; The slave computing unit is used for processing the second data segment and the third data segment corresponding to the current first data segment in parallel when the main control unit schedules the current first data segment so as to execute partial accumulation and addition computation, obtain an intermediate computing result and then transmit the intermediate computing result through a cascade data channel; And the scheduling unit is used for starting the scheduling flow of the next first data segment after the main control unit completes scheduling of the current first data segment and before the secondary computing unit completes corresponding accumulation and addition computation of all parts.
- 6. The hardware accelerator of claim 5 wherein the main control unit comprises: The first accumulation unit is used for multiplying the scheduled current first data segment with the initial second data segment, and carrying out first accumulation on the scheduled current first data segment and an intermediate calculation result generated by the last operation to obtain an accumulation value and a first carry value; The modular arithmetic unit is used for carrying out modular multiplication operation on the accumulated value to generate a modular multiplication coefficient; and the second accumulation unit is used for multiplying the modular multiplication coefficient with the initial third data segment to obtain a multiplication result, and carrying out second accumulation on the multiplication result, the accumulation value and the first carry value to obtain a second carry value.
- 7. The hardware accelerator of claim 6 wherein the slave computing unit comprises: The data segment selection unit is used for selecting a corresponding second data segment and a corresponding third data segment according to the current iteration number; the third accumulation unit is used for carrying out third accumulation on the intermediate calculation result generated by the previous operation, the product of the current first data segment and the second data segment, the product of the modular multiplication coefficient and the third data segment and the second carry value so as to obtain an accumulated value and a third carry value of the current iteration; and the first intermediate result output unit is used for updating the accumulated value of the current iteration into an intermediate calculation result of the current sequence bit.
- 8. The hardware accelerator of claim 7, further comprising: And the second intermediate result output unit is used for taking the low-order information of the third carry value generated by the current iteration as the intermediate calculation result of the last sequence bit after the calculation of all the slave calculation units is finished.
- 9. The hardware accelerator of claim 8, wherein the cache module comprises: And the second data storage unit is used for receiving the intermediate calculation results corresponding to each sequence bit and sequentially storing each intermediate calculation result to obtain a final calculation result constructed based on the intermediate calculation results characterized by a plurality of data segment forms.
- 10. The hardware accelerator of any of claims 1 to 9, wherein the modular multiplication computation module comprises: and the information sending unit is used for sending the modular multiplication calculation ending information to the register configuration module when the modular multiplication calculation is ended, so that the register configuration module sends an interrupt signal to the processor chip.
Description
Hardware accelerator Technical Field The present invention relates to the field of integrated circuits and hardware acceleration technologies, and in particular, to a hardware accelerator. Background As cryptographic algorithms continue to increase in key length (e.g., from 2048 bits to 4096 bits), hardware modular arithmetic units integrated into processor chips face significant challenges. The existing hardware implementation scheme mostly adopts a single or limited parallel computing structure, when large-bit number operation is processed, the working frequency of a chip is difficult to be improved due to overlong data paths and tense key time sequences, meanwhile, the hardware design with fixed bit width also lacks flexible adaptation capability to different algorithm specifications, so that the utilization rate of hardware resources is low or special circuits are required to be repeatedly designed for different specifications. These problems limit the energy efficiency and throughput of the chip in processing high performance cryptographic operations. Disclosure of Invention The embodiment of the invention aims to provide a hardware accelerator, which solves the problems of poor flexibility, difficult improvement of working frequency and insufficient operation throughput rate faced by a password operation hardware unit integrated in a processor chip through a configurable data splitting and pipeline parallel computing architecture. In order to solve the technical problems, the embodiment of the invention provides a hardware accelerator which is applied to a processor chip and comprises a register configuration module, a cache module and a modular multiplication calculation module, wherein, The register configuration module is used for receiving the configuration parameters sent by the processor chip and generating a calculation starting signal; The caching module is used for receiving and caching target operation data input by the processor chip based on the configuration parameters; The modular multiplication calculation module is respectively connected with the register configuration module and the cache module, and is used for splitting the target operation data into a plurality of data segments according to the configuration parameters and carrying out Montgomery modular multiplication operation on each data segment based on a pipeline structure comprising a master control unit and a plurality of slave calculation units; the slave computing units are arranged in cascade, and different data segments are processed in parallel under the scheduling of the master control unit. Optionally, the cache module includes: The verification unit is used for performing condition verification processing on the initial operation data input by the processor chip based on the total bit width parameter in the configuration parameters so as to obtain a corresponding data verification result; and the first data storage unit is used for storing the initial operation data as target operation data if the data verification result passes the condition verification. Optionally, the cache module includes: and the signal sending unit is used for sending an error state signal to the register configuration module if the data check result fails the condition check, so that the register configuration module generates and sends an error interrupt signal to the processor chip based on the error state signal. Optionally, the modular multiplication calculation module includes: the data splitting unit is used for splitting the first operand, the second operand and the third operand in the target operation data according to the split data bit width parameter in the configuration parameter so as to obtain a plurality of first data segments, second data segments and third data segments; The first data segment is a preset number of multiplicand data segments processed according to a preset operation sequence, and the second data segment and the third data segment are a preset number of multiplier data segments and modulus data segments processed in parallel. Optionally, the modular multiplication calculation module includes: the main control unit is used for scheduling each first data segment according to the preset operation sequence; The slave computing unit is used for processing the second data segment and the third data segment corresponding to the current first data segment in parallel when the main control unit schedules the current first data segment so as to execute partial accumulation and addition computation, obtain an intermediate computing result and then transmit the intermediate computing result through a cascade data channel; And the scheduling unit is used for starting the scheduling flow of the next first data segment after the main control unit completes scheduling of the current first data segment and before the secondary computing unit completes corresponding accumulation and addition computation of all parts. Optional