CN-121979487-A - Cube root calculating method and device based on pre-scaling lookup table

CN121979487ACN 121979487 ACN121979487 ACN 121979487ACN-121979487-A

Abstract

The application discloses a cube root calculating method and device based on a pre-scaling lookup table, wherein the method comprises the steps of dividing an input domain into R disjoint subintervals; the method comprises the steps of independently constructing a pre-scaling lookup table for each subinterval, directly storing a final cube root fixed-point integer value containing an output scaling factor in each table entry, determining an subinterval index and an intra-interval offset index to which the subinterval index belongs according to an input value, generating a storage address according to the index, and directly reading a cube root result from the pre-scaling lookup table and outputting the cube root result. The device comprises a range detection and address generation module, a pre-scaling BRAM storage unit and a pipeline register set. The application eliminates the scaling operation in operation fundamentally by integrating the output scaling factor into the BRAM content in advance. The application only consumes 7 LUTs, 36 triggers and 0.5 BRAM, realizes fixed 3 clock period delay, and is particularly suitable for embedded systems with limited resources.

Inventors

ZHANG JIN
LIU FEI
CHEN ZIMING
LI ZHONGYANG

Assignees

金陵科技学院

Dates

Publication Date: 20260505
Application Date: 20260202

Claims (9)

1. The cube root calculating method based on the pre-scaling lookup table is characterized by comprising the following steps of: S1, dividing an input domain into The number of disjoint subintervals is such that, setting each subinterval Wherein , Is a reference index; S2, for each subinterval Independent construction of pre-scaling look-up tables Each entry of the pre-scaling look-up table directly stores the final cube root fixed point integer value fused with the interval output scaling factor: ; Wherein the method comprises the steps of In order to output the decimal place width, Is the first The first subinterval Physical values of the sampling points; s3, determining the subinterval index to which the input value belongs according to the input value And intra-interval offset index ; S4, indexing according to subintervals And intra-interval offset index Generating a memory address, then from a pre-scaled look-up table And directly reading the cube root result and outputting.
2. The method of pre-scaling look-up table based cube root calculation according to claim 1, wherein in step S1, the input domain coverage is The corresponding parameters are configured as reference index Number of subintervals 。
3. The method of computing a cube root based on a pre-scaling look-up table as defined in claim 1, wherein, in step S2, Subinterval Is continuously stored in the same memory block, the first Mapping entries of subintervals to address space Memory address 。
4. The method of computing a cube root based on a pre-scaling look-up table according to claim 1, wherein in step S2, a midpoint sampling strategy is used to determine the physical value of each sampling point: ; Wherein the method comprises the steps of Is the first The lower bound of the subinterval, For the width of the section to be the same, For the number of sample points per subinterval, In order for the index bit to be wide, 。
5. The method of computing a cube root based on a pre-scaling look-up table as defined in claim 1, wherein, in step S3, Determining subinterval index by detecting the position of the most significant bit of the input fixed-point integer: wherein The index is the weight index of the reference bit, and the offset index in the interval is obtained by a bit extraction mode 。
6. The method of pre-scaling look-up table based cube root calculation according to claim 1, further comprising, in step S3, for any floating point input Parameter reduction is carried out, and the method concretely comprises the following steps: S31, inputting the index Performing die 3 decomposition: wherein Mapping inputs to valid mantissas within the coverage area ; S32, calculating by utilizing the pre-scaling lookup table ; S33, carrying out index adjustment output on the table lookup result: Thereby supporting floating point number cube root computation for full dynamic range.
7. A cube root computing device based on a pre-scaling look-up table, comprising: A range detection and address generation module for determining subinterval index according to input data Intra-interval offset index And splice to generate storage address ; Pre-scaling BRAM storage unit for storing The pre-scaling lookup table of the subinterval, the storage content integrates the scaling factors of the corresponding subinterval in advance, so that the output data does not need post-processing multiplication; and the pipeline register set is used for connecting the range detection and address generation module and the pre-scaling BRAM storage unit to realize full pipeline processing.
8. The pre-scaling look-up table based cube root computing device of claim 7 wherein the total delay of the device is fixed to 3 clock cycles and the throughput rate is equal to the clock frequency.
9. The pre-scaling look-up table based cube root computing device of claim 7 wherein the pre-scaling BRAM storage unit is configured as a monolithic FPGA on-chip BRAM with input in Q2.22 format and output in Q1.23 format.

Description

Cube root calculating method and device based on pre-scaling lookup table Technical Field The invention relates to the technical field of digital signal processing and FPGA hardware design, in particular to a cube root computing method and device based on a pre-scaling lookup table. Background Cube root operations are critical nonlinear operations in computer graphics, scientific computation, and signal processing. In thermodynamics, the solution of Kirchhoff's equation relies on cube root calculations, in computer graphics, both Phong illumination models and Lab/XYZ color space conversions require cube root operations, and in geometric calculations, the correlation operations involving volumes are also independent of the function. With the continuous improvement of the performance requirements of embedded systems, the cube root operation implemented by software is difficult to meet the real-time requirements. The existing cube root hardware computing method is mainly divided into three main categories: The first is an iterative multiplication algorithm represented by Newton-Raphson. The method realizes function calculation based on iterative approximation of the reciprocal, and has the advantage of high convergence rate. However, a common bottleneck is that each iteration contains multiple multiplications, typically requiring 3 multipliers, that the strong dependence on DSP resources limits its application on resource constrained platforms, and that a look-up table is required to provide a high precision initial estimate. Typical implementations require about 15 clock cycles consuming 5 DSP blocks. The second type is a look-up table method based on ROM/BRAM. The traditional lookup table method realizes function approximation by pre-calculating function values and storing the function values in ROM or BRAM. The basic idea is to normalize the input to the standard interval, look up the table, and then scale the output. However, the core problem faced by the conventional LUT approach is that when the input exponent k is not a multiple of 3, the scaling factor is not an integer power of 2, and the scaling operation requires a multiplier or complex shift addition logic. This bottleneck makes it difficult for the look-up table approach to truly achieve the zero multiplier design goal. The third class is the bitwise recursive (Digit-Recurrence) algorithm. The algorithm generates a result bit by bit in an iterative manner based on the concept of the completion cube. The throughput is severely limited by its characteristic of calculation delay proportional to the number of precision bits. For 24-bit precision, the Radix-2 version requires 24 cycles, and even with a Radix-4 improvement of 2 bits per cycle, 12 cycles are required. Due to the data dependence between iterations, pipelining is limited, and high throughput is difficult to achieve. To sum up, the Newton-Raphson iteration method in the prior art, while converging fast, relies heavily on DSP resources (typically 5 DSP blocks) and has a high delay (about 15 cycles). Conventional look-up table methods, while simple, are designed to handle scaling factors of non-integer powers (e.g) The "zero multiplier" design cannot be truly realized by introducing multipliers or complex shift addition logic after table look-up. One important reason, none of the existing approaches fully exploit the pre-computation capability of BRAM to eliminate run-time arithmetic operations. Therefore, a cube root computing method that can fully utilize FPGABRAM resources, achieve ultra-low logic resource consumption, and low latency is needed. Disclosure of Invention Aiming at the serious trade-off between the resource efficiency and the calculation delay of the existing cube root hardware implementation method, the invention provides the cube root calculation method and the cube root calculation device based on the pre-scaling lookup table, which can fundamentally eliminate the dependence on a DSP unit and complex arithmetic logic during operation and realize extremely low logic resource consumption and fixed low delay. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the cube root calculating method based on the pre-scaling lookup table comprises the following steps: S1, dividing an input domain into The number of disjoint subintervals is such that, setting each subintervalWherein,Is a reference index; S2, for each subinterval Independent construction of pre-scaling look-up tablesEach entry of the pre-scaling look-up table directly stores the final cube root fixed point integer value fused with the interval output scaling factor: ; Wherein the method comprises the steps of In order to output the decimal place width,Is the firstThe first subintervalPhysical values of the sampling points; s3, determining the subinterval index to which the input value belongs according to the input value And intra-interval offset index; S4, indexing accor