US-12619394-B2 - Method of performing hardware efficient unbiased rounding of a number
Abstract
A method and hardware for performing hardware efficient unbiased rounding of a number includes receiving the number in a binary format having a first portion and a second portion. The first portion comprises bits of the number above a rounding point and the second portion comprises bits of the number after the rounding point. The method includes adding a first amount to the number to obtain a first value. Further the method comprises determining if the bit above the rounding point for a controlling value is ‘0’ bit or a ‘1’ bit. The controlling value is either the received number in the binary format or the first value. The method further includes adding a second amount to ‘b+1’ LSBs of the first value to obtain a second value if the bit above the rounding point for the controlling value is a ‘0’ bit and rounding the number by truncating the last b bits of the second value or the last b bits of the first value based on the determination.
Inventors
- Timothy Lee
Assignees
- IMAGINATION TECHNOLOGIES LIMITED
Dates
- Publication Date
- 20260505
- Application Date
- 20220314
- Priority Date
- 20210312
Claims (20)
- 1 . A method of processing an input value to a processing system, said input value comprising a number representing data, the method comprising: receiving, at a first adder implemented in hardware logic, the number in a binary format with a bit length of m bits, having a first portion with bit length of ‘a’ bits and second portion with bit-length of ‘b’ bits, wherein the first portion comprises bits of the number above a rounding point and the second portion comprises bits of the number after the rounding point; performing a hardware efficient unbiased rounding of the number by: adding, by the first adder, a first amount to the number to obtain a first value; determining if the bit above the rounding point for a controlling value is ‘0’ bit or a ‘1’ bit, wherein the controlling value is either the received number in the binary format or the first value; if the bit above the rounding point for the controlling value is a ‘0’ bit, adding, by a second adder implemented in hardware logic, a second amount to ‘b+1’ LSBs of the first value to obtain a second value and truncating the last b bits of the second value by a shifter implemented in hardware logic; or if the bit above the rounding point for the controlling value is a ‘1’ bit, truncating the last b bits of the first value by said shifter; wherein the first amount and the second amount to be added are determined based on the number of ‘b’ bits of the second portion and wherein the first value and the second value are represented in the same binary format as the received number; and outputting the truncated value from the shifter to the processing system to facilitate processing of said number by the processing system.
- 2 . The method according to claim 1 , wherein the value of the second amount is the value of the LSB of the second portion.
- 3 . The method according to claim 1 , wherein adding the second amount comprises adding a ‘1’ to the LSB of the first value.
- 4 . The method according to claim 1 , wherein the first amount is the difference between the value of position of MSB of the second portion and the second amount, such that when b=1, the first amount is zero.
- 5 . The method according to claim 1 , wherein adding the second adder is a ‘b+1’ bit adder.
- 6 . The method according to claim 1 , wherein the first value and the second value are having the same ‘a−1’ MSBs.
- 7 . The method according to claim 1 , wherein adding a second amount to ‘b+1’ LSBs of the first value generates an intermediate value having a bit length of ‘b+1’ bits.
- 8 . The method according to claim 6 , wherein the second value is obtained by combining ‘a−1’ MSBs of the first value and ‘b+1’ bits of the intermediate value.
- 9 . The method according to claim 1 , wherein the received number has an integer part and a fractional part.
- 10 . The method according to claim 9 , wherein the first portion of ‘a’ bits comprises bits of the integer part and none, one or more MSBs of the fractional part to which the number needs to be rounded.
- 11 . The method according to claim 1 , wherein the first adder is a ‘m’ bit adder.
- 12 . A non-transitory computer readable storage medium having stored thereon computer readable code configured to cause the method of claim 1 to be performed when the code is run on at least one processor.
- 13 . An efficient hardware implementation for processing an input value to a processing system, said input value comprising a number representing data, the hardware implementation performing unbiased rounding and comprising: a first adder implemented in hardware logic configured to: receive said number in binary format with a bit length of m bits, having a first portion with bit length of ‘a’ bits and a second portion with bit-length of ‘b’ bits, the first portion comprises bits of the number before a rounding point and the second portion comprises bits of the number after the rounding point, and add a first amount to the number N to obtain a first value; a second adder implemented in hardware logic configured to add a second amount to ‘b+1’ LSBs of the first value to obtain a second value, wherein the second amount is added if the bit above the rounding point for the first value is a ‘0’ bit; wherein the first amount and the second amount to be added is determined based on the bit length ‘b’ bits of the second portion; a shifter implemented in hardware logic configured to: shift the second value by ‘b’ bit to the LSB side if the bit above the rounding point for the first value is a ‘0’ bit, or shift the first value by ‘b’ bit to the LSB side if the bit above the rounding point for the first value is a ‘1’ bit; and said shifter being further configured to output the shifted value to the processing system to facilitate processing of said number by the processing system.
- 14 . The hardware implementation according to claim 13 , wherein the first adder is an m-bit adder.
- 15 . The hardware implementation according to claim 13 , wherein the second adder is a ‘b+1’ bit adder.
- 16 . The hardware implementation according to claim 13 , wherein the shifter is implemented by hard wiring the wires from the second adder or the first adder.
- 17 . The hardware implementation according to claim 13 , wherein the first adder is integrated with an in-built adder in a logic in the hardware producing the number to be rounded, and optionally wherein the logic is a Finite Impulse Response (FIR) filter used in image processing.
- 18 . The hardware implementation according to claim 13 , wherein the second adder adds the second amount by adding ‘1’ to the LSB of the first value.
- 19 . The hardware implementation according to claim 13 , wherein the first adder adds the first amount which is the difference between the value of the MSB of the second portion and the second amount.
- 20 . A non-transitory computer readable storage medium having stored thereon an integrated circuit dataset description that when processed by an integrated circuit manufacturing system causes the integrated circuit manufacturing system to manufacture the hardware implementation as set forth in claim 13 .
Description
TECHNICAL FIELD The application relates to rounding a number. BACKGROUND Rounding is a process of replacing a precise number with an approximate value having a shorter, simpler, or more explicit representation. Often rounding is performed in real life for making calculations simple. It is also used in electronic or computer systems to make a number suitable for performing various calculations. Electronic and computer systems use numbers in different binary number formats such as floating point numbers or fixed point numbers which have a finite number of digits for representing the number. While using such formats, often calculation with precise numbers is not possible or in other words, rounding or approximation of numbers becomes unavoidable. There are many known techniques or methods of performing rounding. Some of the method includes Rounding Up, Rounding Down, Rounding Towards Zero, Rounding Away from Zero, Rounding Towards Infinity, Rounding Away from Infinity, Rounding Towards Even Number and the like. Some of these are explained in detail in the sections below. The rounding or approximation of a number to another number introduces an error commonly referred to as a round-off error or rounding error. In a sequence of calculations, these rounding errors generally accumulate, and in certain cases, these errors make the final results meaningless. For example, when considering some electronic systems such as a Finite Impulse Response (FIR) filter in an image processing system, the accumulation of the rounding error may have a considerable effect on the colour of the pixels generated as an output on a display unit. The methods of rounding numbers mentioned above variously generate either a biased or unbiased result. A bias may be generated when a rounding method is statistically more likely to round numbers in one direction than in the opposite i.e. all the numbers (or most of the numbers) are rounded for example ‘Round up’ or ‘Round down’. Thus, the change in each number (decrease if rounded in the direction ‘Towards Zero’ or vice versa) will contribute to causing a significant effect in the calculation of the output, thereby generating a biased output. However, if the numbers are (statistically) equally rounded to both directions (say some numbers in the direction ‘Towards Zero’ and some numbers in the direction ‘Away from Zero’), the effect of changes in the rounded numbers cancel out, thereby generating a relatively unbiased output. For example, one of the simplest options for rounding a positive number is by the basic Rounding Down method or technique. While rounding down, the fractional part (or, more generally, the part following the rounding point, if that is not the same as the radix point) of the number to be rounded is discarded and the integer part of the number is returned as the rounded number. Though this is simple to work, this introduces bias as on average the values of a uniformly distributed set of numbers are reduced by a half. Similarly, an equivalently basic Rounding Up method (i.e. rounding any fractional part up to the closest integer) introduces bias as on average the values of a uniformly distributed set of numbers are increased by a half. Some of the rounding methods such as round towards zero and round away from zero are both supposedly unbiased in expectation, but in reality, these methods introduce a bias of +0.5 in the positive space and −0.5 in the negative space (or vice versa) and this is not ideal. Among all the different rounding methods, round to even has been adopted as the default mode of rounding technique in industry standards as it is unbiased (assuming all the values are equally probable). But unfortunately, there is a considerable cost in hardware to achieve this. Hence, existing methods or techniques of rounding numbers have drawbacks. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. A method and hardware for performing hardware efficient unbiased rounding of a number is provided herein. The method includes receiving the number in a binary format having a first portion and a second portion. The first portion comprises bits of the number above a rounding point and the second portion comprises bits of the number after the rounding point. The method includes adding a first amount to the number to obtain a first value. Further the method comprises determining if the bit above the rounding point for a controlling value is ‘0’ bit or a ‘1’ bit. The controlling value is either the received number in the binary format or the first value. The method further includes adding a second amount to ‘b+1’ LSBs of the first value to obtain a second value if the bit above the rounding point for