Search

CN-121614107-B - Block floating point arithmetic device and differential equation calculating system

CN121614107BCN 121614107 BCN121614107 BCN 121614107BCN-121614107-B

Abstract

The application provides a block floating point operation device and a differential equation calculation system, which relate to the field of calculation devices, wherein the differential equation calculation system manages the flow of data organized in a differential format between storage layers through a buffer controller, a processing unit array synchronously executes Gaussian-Seidel iterative calculation optimized for the differential format based on a wave front sequence, then compresses the result through a block floating point quantizer, and outputs a solution matrix through a differential reduction unit after convergence, wherein a plurality of parallel processing units in the processing unit array combine precision configuration with a block floating point input format to form a collaborative optimization calculation mode, data are compressed through a shared index, and dynamic precision parameters allow the complexity of mantissa processing to be further controlled in the calculation process, so that the problems of resource occupation requirement, memory access times and high energy consumption are solved.

Inventors

  • DU YUAN
  • GAO ZHENGYU
  • LI SHAOXUAN
  • BAI YICHUAN
  • DU LI

Assignees

  • 南京大学

Dates

Publication Date
20260508
Application Date
20260202

Claims (8)

  1. 1. A block floating point arithmetic device, comprising: an input interface configured to receive a first block of floating point operands, a second block of floating point operands, and a mantissa bit width configuration parameter; A reconstruction data processing path, coupled to the input interface, configured to perform an operation on the first block of floating point operands and the second block of floating point operands based on the mantissa bit width configuration parameter to obtain an operation result; The reconstruction data processing path includes an adder path configured to perform an addition operation based on the mantissa bit width configuration parameter and a multiplier path configured to perform a multiplication operation based on the mantissa bit width configuration parameter; the adder path includes: A first special value processing unit configured to process denormal values in the first block of floating-point operands and the second block of floating-point operands to output a first standard operand and a second standard operand; A first sign bit processing unit configured to determine sign bits of an addition result based on sign bits of the first standard operand and the second standard operand; A first floating point number restoring unit configured to restore formats of the first standard operand and the second standard operand to an explicit floating point format based on the first standard operand, the second standard operand and the tail bit width configuration parameter to obtain a first explicit operand and a second explicit operand; An exponent alignment unit configured to compare exponents of the first and second explicit operands to output an exponent difference, and to output an alignment control signal according to the exponent difference; a mantissa operation unit configured to receive the alignment control signal, determine a target operand based on the exponent difference, the target operand being an operand having a small exponent, perform shift alignment on mantissas of the target operand to obtain aligned mantissas, and perform addition operation on the aligned mantissas to obtain mantissa results and corresponding sign bits; A first normalizing unit configured to perform a normalization shift process on the mantissa result while adjusting the exponent to output a first normalized mantissa and a first normalized exponent; a first carry processing unit configured to process a carry of the first normalized mantissa and adjust the first normalized exponent to obtain an addition result; The multiplier circuit includes: A second special value processing unit configured to process denormal values in the first block of floating-point operands and the second block of floating-point operands to output a third standard operand and a fourth standard operand; A second sign bit processing unit configured to determine sign bits of a multiplication result based on sign bits of the third standard operand and the fourth standard operand; A second floating point number restoring unit configured to restore formats of the third standard operand and the fourth standard operand to an explicit floating point format based on the third standard operand, the fourth standard operand and the mantissa bit width configuration parameter to obtain a third explicit operand and a fourth explicit operand; an exponent processing unit configured to perform an addition operation on exponents of the third explicit operand and the fourth explicit operand to calculate an exponent sum, and output an intermediate exponent; a mantissa processing unit configured to perform a multiplication operation on mantissas of the third explicit operand and the fourth explicit operand to obtain a mantissa product; A second normalization processing unit configured to perform normalization shift processing on the mantissa product while truncating the mantissa according to the mantissa bit width configuration parameter and adjusting the intermediate exponent to output a second normalized mantissa and a second normalized exponent; a second carry processing unit configured to process the carry of the second normalized mantissa and adjust the second normalized exponent to obtain a multiplication result; And the output interface is connected with the reconstruction data processing path and is configured to output the operation result.
  2. 2. A differential equation computing system, comprising: A buffer controller configured to load target data in a differential format from an external memory into an on-chip buffer, wherein the target data is a data block of an equation solution matrix and a source term matrix, and the differential format is that elements of each row except a first column in the target data are stored as differential values with elements of a previous column; A processing unit array comprising a plurality of processing units arranged in parallel, the processing units comprising the block floating point arithmetic device of claim 1; The buffer controller is further configured to send the differential format target data to the processing unit array according to a wavefront sequence that groups grid points into a plurality of wavefronts; The processing unit is configured to perform a Gaussian-Saidel iterative computation based on a differential format on the target data within the wavefront synchronously to update a differential matrix of the solution; a block floating-point quantizer configured to perform block floating-point quantization on the updated differential matrix according to a two-dimensional block size and a target tail number bit width to output quantized data, wherein the block floating-point quantization allocates a shared exponent to a plurality of values within the two-dimensional block; And the differential reduction unit is configured to reduce the differential matrix of the on-chip buffer to a solution matrix when iteration converges.
  3. 3. The differential equation calculation system according to claim 2, wherein the system further comprises an error calculation unit; the error calculation unit is configured to calculate the difference between the current iterative solution and the last iterative solution, so as to judge whether convergence conditions are met or not according to the difference, and output a judgment result; the buffer controller is configured to take the quantized data as an input of the next iteration in response to the judgment result being that the convergence condition is not satisfied; and responding to the judging result that the convergence condition is met, and triggering the differential reduction unit to output a final solution matrix.
  4. 4. The differential equation computing system of claim 2, wherein the on-chip buffers include a current value buffer, an offset buffer, and an iterative buffer; The buffer controller is configured to load a solution matrix data block of a differential format of a current iteration into the current value buffer, and to load a data block of the source term matrix into the offset buffer, and to write the quantized data into the iteration buffer.
  5. 5. The differential equation computing system of claim 2, wherein the target tail digital bit width is configured with a plurality of different values and is set based on an iterative phase; when the iteration stage is the first stage, the target tail digital bit width is the first bit width; when the iteration stage is the second stage, the target tail digital width is the second bit width; When the iteration stage is a third stage, the target tail digital bit width is the third bit width; The third stage is an iterative stage close to meeting a convergence condition, the value of the third bit width is larger than the value of the second bit width, and the value of the second bit width is larger than the value of the first bit width.
  6. 6. The differential equation computing system of claim 2, wherein the processing unit is configured to perform a differential format-based gaussian-seidel iterative calculation on the target data, comprising: determining a corresponding differential iterative calculation formula according to the position of the current grid point; based on the differential value in the differential matrix of the solution and the differential value of the source term matrix, updating and calculating a differential iterative calculation formula to obtain a differential result; wherein, the operation object and the difference result of the difference iteration calculation formula are both difference values.
  7. 7. The differential equation computing system according to claim 2, wherein the wavefront sequence is that the grid points are divided into a plurality of wavefronts according to a sum of a row index and a column index, wherein grid points having a same sum of a row index and a column index belong to a same wavefront; The buffer controller is configured to send target data corresponding to grid points in the same wavefront to the processing unit array according to the division sequence of the wavefront; The processing unit array is configured to perform parallel update calculations on target data corresponding to grid points within the same wavefront.
  8. 8. The differential equation computing system of claim 2, wherein the processing unit is further configured to invoke different computing logic to update the target data based on whether the currently computed grid points are at matrix boundaries; The different calculation logics at least comprise first calculation logics, second calculation logics and third calculation logics, wherein the first calculation logics are used for processing grid points of a first boundary of a matrix, calculating based on boundary condition values, the second calculation logics are used for processing grid points in the matrix, calculating based on difference values of adjacent grid points in different directions, and the third calculation logics are used for processing grid points of a second boundary of the matrix, and summing up difference values of a plurality of grid points in the row.

Description

Block floating point arithmetic device and differential equation calculating system Technical Field The application relates to the technical field of computing devices, in particular to a block floating point computing device and a differential equation computing system. Background Partial differential equations can describe natural and engineering phenomena such as acoustics, heat conduction, electromagnetism and the like, a numerical method is needed for solving the equations, a numerical solution is obtained through discrete approximation by a finite difference method, the process is computationally intensive, and clear requirements exist for hardware acceleration. Dedicated hardware accelerators are used to perform finite difference calculations, such accelerators employ a specific system to process discrete grid data. In-memory processing techniques have also been applied to improve computational efficiency, which are aimed at reducing the latency of processor access to data, and hardware-accelerated implementations affect the performance of the solution process. When the partial differential equation is solved based on a finite difference Method (FINITE DIFFERENCE Method, FDM), the special hardware accelerator lacks of data path optimization design, which causes frequent memory access and high energy consumption, and the in-memory processing technology can limit the calculation accuracy and grid scale expandability, resulting in high hardware resource occupation and storage requirement. Disclosure of Invention The application provides a block floating point arithmetic device and a differential equation computing system, which are used for solving the problems of resource occupation requirement, memory access times and high energy consumption. In a first aspect, the present application provides a block floating point arithmetic device comprising: an input interface configured to receive a first block of floating point operands, a second block of floating point operands, and a mantissa bit width configuration parameter; A reconstruction data processing path, coupled to the input interface, configured to perform an operation on the first block of floating point operands and the second block of floating point operands based on the mantissa bit width configuration parameter to obtain an operation result; And the output interface is connected with the reconstruction data processing path and is configured to output the operation result. In a second aspect, a differential equation computing system includes: A buffer controller configured to load target data in a differential format from an external memory into an on-chip buffer, wherein the target data is a data block of an equation solution matrix and a source term matrix, and the differential format is that elements of each row except a first column in the target data are stored as differential values with elements of a previous column; A processing unit array comprising a plurality of processing units arranged in parallel, the processing units comprising the block floating point arithmetic device of any one of the first aspects; The buffer controller is further configured to send the differential format target data to the processing unit array according to a wavefront sequence that groups grid points into a plurality of wavefronts; The processing unit is configured to perform a Gaussian-Saidel iterative computation based on a differential format on the target data within the wavefront synchronously to update a differential matrix of the solution; a block floating-point quantizer configured to perform block floating-point quantization on the updated differential matrix according to a two-dimensional block size and a target tail number bit width to output quantized data, wherein the block floating-point quantization allocates a shared exponent to a plurality of values within the two-dimensional block; And the differential reduction unit is configured to reduce the differential matrix in the on-chip buffer to a solution matrix when iteration converges. According to the technical scheme, the block floating point computing device and the differential equation computing system comprise a buffer controller, wherein the buffer controller is configured to load target data in a differential format into an on-chip buffer from an external memory, the target data is a data block of an equation solution matrix and a source item matrix, the differential format is that elements of each row except for a first column in the target data are stored as differential values of elements of a previous column, the processing unit array comprises a plurality of processing units which are arranged in parallel, the processing unit comprises a block floating point computing device, the buffer controller is further configured to send the target data in the differential format to the processing unit array according to a wavefront sequence, the wavefront sequence is that grid points are g