KR-20260067328-A - SYSTEMS AND METHODS FOR ARTIFICIAL INTELLIGENCE COMPUTATIONS

KR20260067328AKR 20260067328 AKR20260067328 AKR 20260067328AKR-20260067328-A

Abstract

The device includes a memory for storing a first vector including a first operand having a first mantissa value associated with a layer of an AI model and a second vector including a second operand having a second mantissa value; and a processor, wherein the processor includes an addition circuit that receives the first mantissa value and the second mantissa value and generates a first sum based on the first mantissa value and the second mantissa value; and a shifter circuit configured to receive the first sum and shift the first sum by a first number of bits to generate a first shifted value, and the processor is configured to generate inference of the AI model based on the first shifted value.

Inventors

최치호
최준희
파드마나반, 사이 프랄라드
말라, 스리칸쓰

Assignees

삼성전자주식회사

Dates

Publication Date: 20260512
Application Date: 20251030
Priority Date: 20251015

Claims (20)

In the device, A memory storing a first vector including a first operand having a first mantissa value associated with a layer of an AI model and a second vector including a second operand having a second mantissa value; and Includes a processor, The above processor is, An addition circuit configured to receive the first mantissa value and the second mantissa value, and to generate a first sum based on the first mantissa value and the second mantissa value; and It includes a shifter circuit configured to receive the first sum and shift the first sum by a first number of bits to generate a first shifted value, and The above processor is a device configured to generate inference of the AI model based on the first shifted value.
In Article 1, The above first number is a device based on the expected value of one or more mantissa values associated with the layer of the above AI model.
In Article 2, The above expected value is a device based on the statistical distribution of the parameters of the above layer of the above AI model.
In Article 1, The above first number is a device based on the first term of the binary decomposition of the expected value of one or more mantissa values of one or more parameters of the AI model.
In Article 1, A device in which the first number above is 0.
In Article 1, The above shifter circuit is configured to receive the first sum and shift the first sum by a second number of bits to generate a second shift value, and The above processor is a device configured to generate the inference of the AI model based on the second shifted value.
In Article 6, The above addition circuit receives the first shifted value and the second shifted value to generate a second sum, and The above processor is a device configured to generate the inference of the AI model based on the above second sum.
In Article 6, The above second number is a device based on the second term of the binary decomposition of the expected value of one or more mantissa values of one or more parameters of the AI model.
In Article 1, The above shifter circuit is a device configured to shift the above first sum to the left by the above first number of bits.
In Article 1, The above shifter circuit is a device configured to shift the above first sum to the right by the above first number of bits.
In terms of method, A step of storing in a memory device a first vector including a first operand having a first mantissa value associated with a layer of an AI model and a second vector including a second operand having a second mantissa value; A step of routing the first mantissa value and the second mantissa value to the addition circuit of a processor; A step of outputting a first sum based on the first mantissa value and the second mantissa value by the above addition circuit; A step of routing the above first sum to the shifter circuit of the processor; A step of generating a first shifted value by shifting the first sum by a first number of bits using the shifter circuit; and A method comprising the step of generating inference of the AI model based on the first shifted value by the processor.
In Article 11, The above first number is a method based on the expected value of one or more mantissa values associated with the layer of the above AI model.
In Article 12, The above expected value is a method based on the statistical distribution of the parameters of the above layer of the above AI model.
In Article 11, The above first number is a method based on the first term of the binary decomposition of the expected value of one or more mantissa values of one or more parameters of the AI model.
In Article 11, The above first number is a method in which it is 0.
In Article 11, A step of routing the above first sum to the above shifter circuit; and A method further comprising the step of generating a second shifted value by shifting the first sum by a second number of bits using the shifter circuit.
In Article 16, A step of routing the first shifted value and the second shifted value to the addition circuit; and The method further includes the step of outputting a second sum based on the first shifted value and the second shifted value by the above addition circuit, The above reasoning is a method generated based on the above second sum.
In Article 16, The above second number is a method based on the second term of the binary decomposition of the expected value of one or more mantissa values of one or more parameters of the AI model.
In Article 11, A method in which the above shifter circuit is configured to shift the above first sum to the left by the above first number of bits.
In Article 11, A method in which the above shifter circuit is configured to shift the above first sum to the right by the above first number of bits.

Description

Systems and Methods for Artificial Intelligence Computations One or more aspects of the embodiments according to the present disclosure relate to artificial intelligence models, and more specifically, to operations used in artificial intelligence models. The use of artificial intelligence (AI) has increased rapidly in recent years. AI is commonly used in fields such as image classification, speech recognition, media analysis, healthcare, autonomous devices, and smart assistance. The use of AI often requires large-scale data sets (e.g., databases, sensors, images) and advanced algorithms, and the use of these algorithms similarly requires high-performance computing with computational capabilities in the teraflops range. The information described above in the background art section is intended solely to enhance understanding of the background art of the present disclosure, and therefore may include information that does not constitute prior art. Non-limiting and non-comprehensive embodiments of the present disclosure are described together with the following drawings, wherein the same reference numerals refer to the same parts throughout the various drawings unless otherwise specified. FIG. 1 shows a conceptual diagram of an AI model according to one or more embodiments. FIG. 2 shows a block diagram of a system for performing AI operations according to one or more embodiments. FIG. 3 shows an approximation of floating-point multiplication using bit shift and addition operations according to one or more embodiments. FIG. 4 shows an example of an AI operation using bit shift and addition operations according to one or more embodiments. FIG. 5 shows a flowchart for AI operations according to one or more embodiments. FIG. 6 shows a flowchart for other AI operations according to one or more embodiments. In the following, exemplary embodiments will be described in more detail together with the accompanying drawings. As such, the same reference numerals refer to the same elements throughout. However, the present disclosure may be practiced in various other forms and should not be construed as being limited to the embodiments described in the present disclosure. Rather, these embodiments are provided as examples to explain the present disclosure more thoroughly and completely and to fully convey the aspects and features of the present disclosure to those skilled in the art. Accordingly, processes, elements, and techniques that are not essential to those skilled in the art to which the present invention pertains may not be described for a complete understanding of the aspects and features of the present disclosure. Unless otherwise indicated, the same reference numerals refer to the same elements throughout the accompanying drawings and the entire specification, and thus, their descriptions may not be repeated. Additionally, in the drawings, the relative sizes of elements, layers, and regions may be expanded or simplified for clarity. Embodiments of the present disclosure are described below with reference to block diagrams and flowcharts. Accordingly, it should be understood that each block of the block diagrams and flowcharts may be implemented in the form of a computer program product, an embodiment that is entirely hardware, a combination of hardware and computer program products, and/or devices, systems, computing devices, computing entities, etc., that execute instructions, operations, steps, and similar terms used interchangeably on a computer-readable storage medium for execution (e.g., executable instructions, instructions for execution, program code, etc.). For example, retrieval, loading, and execution of code may be performed sequentially so that one instruction may be retrieved, loaded, and executed at a time. In some exemplary embodiments, retrieval, loading, and/or execution may be performed in parallel so that multiple instructions may be retrieved, loaded, and/or executed together. Accordingly, these embodiments may produce specific-configured machines that perform the steps or operations implemented in the block diagrams and flowcharts. Accordingly, block diagrams and flowcharts support combinations of various embodiments for performing specific commands, operations, or steps. Additionally, the features of the embodiments of the present disclosure may be combined or combined with one or more other features, either partially or wholly, and may operate in various ways, and one embodiment may be implemented independently of one or more other embodiments or together with one or more other embodiments. Generally, AI models can perform large-scale operations during tasks such as inference. These operations can utilize significant processing resources, which consequently consumes power. In some AI models with transformer structures, the computational cost and power requirements of inference can increase quadratically with respect to the length of the input sequence. In some examples, most of the power usage