KR-20260063510-A - COMPUTATIONAL PROCESSING DEVICE AND METHOD USING DEPTH TAG BASED ZERO SKIP

KR20260063510AKR 20260063510 AKR20260063510 AKR 20260063510AKR-20260063510-A

Abstract

An apparatus and method for processing operations using depth tag-based zero skip are disclosed. An apparatus for processing operations according to one embodiment of the present invention includes: a tag generation unit that checks a plurality of first level tags pre-set corresponding to an input feature stored in a buffer and sets the value of each of a plurality of second level tags matched with the plurality of first level tags in a pre-set group unit to 1 or 0; a tag processing unit that extracts an input feature corresponding to a second level tag among the plurality of second level tags in which the value of the second level tag is 1 and transmits it as an input feature to be processed by operation; and an operation unit that performs processing operations based on the transmitted input feature to be processed by operation.

Inventors

박종준
양해찬

Assignees

주식회사 모빌린트

Dates

Publication Date: 20260507
Application Date: 20241030

Claims (10)

A tag generation unit that sets the value of each of a plurality of first-level tags matched in preset group units corresponding to an input feature stored in a buffer to 1 or 0 according to the value of the matching input feature, and checks the set plurality of first-level tags to set the value of each of a plurality of second-level tags matched in preset group units to 1 or 0; A tag processing unit that extracts an input feature corresponding to a second level tag among the plurality of second level tags where the value of the second level tag is 1, and transmits it as an input feature to be processed; and A computation processing device comprising a computation unit that performs computation processing based on the input characteristics of the computation target transmitted above.
In claim 1, The above tag generation unit is, A processing device that sets the value of a second level tag to 0 if the values of a plurality of first level tags within a group are all 0, and sets the value of a second level tag to 1 if at least one of the values of a plurality of first level tags within a group is 1.
In claim 1, The above tag processing unit is, A computation processing device that skips a corresponding first level tag with a value of 0 among a plurality of first level tags that match a corresponding second level tag with a value of 1 as a non-processing target for computation, and extracts an input feature corresponding to a corresponding first level tag with a value of 1 to determine the input feature to be processed for computation.
In claim 1, The above tag processing unit is, A computational processing device that calculates the nearest buffer address where next valid data is stored using the above plurality of second-level tags.
In claim 4, The above tag processing unit is, Calculate the smallest first index among the plurality of second-level tags with a value of 1 from after the current buffer address, and A processing device for calculating the smallest second index among a plurality of first level tags that have a value of 1 among the first level tags matched to the second level tag of the first index.
In claim 5, The above tag processing unit is, An arithmetic processing device that calculates the next effective address by performing bitwise-concat processing based on the first index and the second index.
In claim 1, The above tag processing unit is, A computation processing device that skips a corresponding second level tag among the plurality of second level tags, in which the value of the second level tag is 0, as an unprocessed computation target.
In a method performed by an arithmetic processing unit, A step of setting the value of each of a plurality of first level tags matched in preset group units corresponding to an input feature stored in a buffer to 1 or 0 according to the value of the matching input feature, and verifying the set plurality of first level tags to set the value of each of a plurality of second level tags matched in preset group units with the plurality of first level tags to 1 or 0; A step of extracting an input feature corresponding to a second level tag among the plurality of second level tags where the value of the second level tag is 1, and transmitting it as an input feature to be processed; and A method for processing operations, comprising the step of performing operations based on the input characteristics of the operation target transmitted above.
In claim 8, In the step of setting the value of each of the plurality of second level tags to 1 or 0, A method for processing operations, wherein if the values of multiple first-level tags within a group are all 0, the value of the corresponding second-level tag is set to 0, and if at least one of the values of multiple first-level tags within a group is 1, the value of the corresponding second-level tag is set to 1.
In claim 8, In the step of transmitting the input features to be processed above, A method for processing operations that calculates the nearest buffer address where next valid data is stored using the above-mentioned plurality of second-level tags.

Description

Computational Processing Device and Method Using Depth Tag Based Zero Skip The disclosed embodiments relate to an apparatus and method for processing operations using depth tag-based zero skip. A deep learning-based computation device may be configured to include a systolic array comprising a plurality of multiplier accumulator (MAC) units, a multiplexer connected to at least one of the plurality of MAC units, and a control circuit that controls the operation of the MAC units according to a plurality of computation modes. The above-described computation device may include a weight stationary operation method during computation. The weight stationary method uses fixed weights and may not be suitable for MAC computation methods that modify weights according to input. FIG. 1 is a block diagram for explaining a computational processing device according to one embodiment. FIGS. 2 to 6 are exemplary diagrams for explaining a calculation processing method according to one embodiment. FIG. 7 is a flowchart for explaining a calculation processing method according to one embodiment. FIG. 8 is a block diagram illustrating a computing environment including a computing device according to one embodiment. Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to facilitate a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, this is merely illustrative and the present invention is not limited thereto. In describing the embodiments of the present invention, detailed descriptions of known technologies related to the present invention are omitted if it is determined that such detailed descriptions may unnecessarily obscure the essence of the present invention. Furthermore, the terms described below are defined in consideration of their functions within the present invention, and these may vary depending on the intentions or practices of the user or operator. Therefore, such definitions should be based on the content throughout this specification. Terms used in the detailed description are intended merely to describe the embodiments of the present invention and should not be limiting in any way. Unless explicitly stated otherwise, expressions in the singular form include the meaning of the plural form. In this description, expressions such as "include" or "comprise" are intended to refer to certain characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof, and should not be interpreted to exclude the existence or possibility of one or more other characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof other than those described. FIG. 1 is a block diagram for explaining a computational processing device according to one embodiment. Hereinafter, a method for processing operations according to one embodiment will be described with reference to FIGS. 2 to 6, which are exemplary diagrams for explaining the method. Referring to FIG. 1, the computational processing unit (100) includes a tag generation unit (110), a tag processing unit (120), and a computation unit (130). The components illustrated in FIG. 1 are not essential for implementing the computational processing unit (100) according to the present disclosure, so the computational processing unit (100) described in this specification may have more or fewer components than those listed above. The components illustrated in FIG. 1 may be connected to each other so as to be communicable through a communication network (not shown). In some embodiments, the communication network may include the Internet, one or more local area networks, wire area networks, cellular networks, mobile networks, other types of networks, or a combination of these networks. Feature data may often consist of specific intervals where all values are zero. For example, referring to Figure 2, in the case of 3D data, there may be many empty intervals (ES) where all values are zero. A typical weight stationary-based MAC operator can use a method of receiving input data in a sequential manner. In this case, the MAC operator does not apply a skip operation to detect in advance when the input data is 0. The above weight station may refer to a structure designed to minimize energy consumption when reading weights by using fixed weights and minimizing the retrieval of weights from a register file. On the other hand, other computing systems implement a method of storing input data in a compressed form and modifying weights accordingly to perform MAC operations, but this may not be suitable for weight stationary structures. A typical MAC operator can perform matrix operations on 10 x 10 elements at once. Accordingly, input data can be stacked in groups of 10 along the channel direction. In this case, each piece of input data can be 1 byte. The 10 bytes that can be processed by the MAC operator at once can